[MUSIC] Welcome to the lecture on SciFinder. SciFinder is one of the commercial tools that we are using in this course and I trust that your home or university provide you with access to it. Or else try go to the university library, and see if you can sit down there, and play with it. This is essential. This [COUGH] tool is quite complex, and complicated and we will not do it justice in this video. The only way to actually learn how to wield it, is to play with it. So what we are going to explore in this video a little, we'll just scratch the surface is the ADME space of small molecules. ADME is an abbreviation of absorption, distribution, metabolism, and excretion. And it defines a space of small organic molecules that have proven to be useful for medicaments. And they shouldn't be too big to fulfill the ADME criteria, and also to complicate it, to manufacture if they are. And I think that the molecule here, all right, I don't know if you recognize or axial from the California U tree that certainly defines the upper limit. Most people would prefer that 500 in molecular weight, mostly upper limit and the complexity of this is so that it's not really practical to prepare this compound by chemical synthesis. It has been done, it can be done and I didn't mention it but this is probably the most important chemo-therapeutic for breast cancer that we have at the moment. So how do you generalize these types of molecules in patents? We will discuss how to do it with Kennedy sequences and I think that it's pretty obvious how you would do that with sequences. But it's not that trivial to see how it's done with small molecules. And when people then successfully do generalize their inventions so that you get families of molecules rather than a single molecule at the time, how can you search your competitors for whether they have patented families of molecules that interfere with what you want to do. And we want to search them in a fashion that includes salts and also closely related structures. Simple [INAUDIBLE] that you can come up with and could be used to bypass other people's patents if they were not phrased in a clever way. One of the ways of defining these is by Markush formula, you see a very simple Markush formula, it has one variable, R1, which is an alkyl carrying an alcohol, an aldehyde or a carboxylic acid with a length of two to four. And then you have a halogen that can occupy any of the four possible positions on the six-membered ring of this indole. So if you counted this, this gives rise to some 36 different molecules, a little family. At least if you require the alcohol chain up here to be linear and with the function group situated in the end. If you allow other variations the number will increase a bit. But that should be clearly defined and if you do that, you can define compounds like this when you make your patent claims. But how to search them, that's not trivial. This is a tool,SciFinder, that we are going to use. You can see that I'm about to log in. This is, as I said, a commercial service, it is, rather expensive, but it's very valuable for this purpose, that we are going to explore. We will not explore all tools that are in SciFinder. It's a very complex database. We will only be concerned with the structure editor that we are seeing here. Now, here on the right hand side, you can see three different choices. Structure, reaction, and Markush formula. We are interested in the first, and in the last one. But we will not investigate Markush, which is a bit more complicated. And then down here, you can see that we can select between exact search and then substructure search and a similarity search. The substructure search is what we are interested in here. What we want to find is a family of compounds that have a core in common, that is the substructure that they have in common. And they are then decorated with various substituents so as to create a family of related molecules. Now, we are SciFinder and with me we have brought this molecule, look at these six fluorines up here. And a nitrogen there, and the other nitrogen down here, which could carry a charge. And note also that this entire portion of the molecule is rather flexible. This is a anti-depressant drug. It's called fluvoxamine, and it is known to have some side effects. And when you encounter that, you'll sometimes get the idea that maybe if we could make the molecule a bit stiffer without destroying the activity, then maybe we could gain something on specificity. Think of a rubber key fitting into a lock. You may be able to get the key into that lock but its not nearly as good as a stiff key that really fits. So, lets see what we can do about it. We will go down to chemical structure, we will click at the structure editor. And then we will grow the part of the molecule that we don't want to change, the aromatic portion with the three fluorines on it. So we pick a benzene ring and then we pic a single bond. And add up here. The SciFinder structure editor knows a lot about chemistry, so if you do something illegal, you will be informed. But so far we've gotten away with it. Now, how do we make something which is stiffer and contains all these nitrogens at appropriate distances? Well, right here I'm thinking about [INAUDIBLE] an indo ring. So we can draw in such a thing. First the five member once then the slightly tricky part of adding the six-membered ring on top, or on the side. And then we need a nitrogen there, and now we want to put the other nitrogen. And we could attach it here. We don't want it directly attached though. This doesn't look a very stable structure. So instead, we put a variable in here, an alkaline chain, and that can be of variable length. And then we add the Amino group to that one instead. We click on an H and put on two hydrogens on to it. And now we connect the two structures. But we are not done, because we don't want SciFinder to try to decorate these aromatic structures with all sorts of things. And the Structure Editor provides a tool for avoiding that. And, So here it is, the ring locking two by which we lock both rings, and they get bold faced in the process. And then, now we are ready to search. That was a very meager result, just a single molecule was found, but let's take a look at it. We can do that by double clicking on it and here it is. And now we want to know what kind of document this one belongs to. And that we can see down here in additional details. And it is a pattern, but it's unspecified so far because what we need to do now is to retrieve the references. And that's what we do here. We can fill the least but we will just get the one here. And we can see here that it is a patent from 1981, so that's expired. And we can also the antidepressive no data compounds. And this means that yes, this was indeed developed to be an antidepressant drug, just like the thing that we were looking for. But there is no experimental data to support activity, and so it's very questionable whether this is a good compound or not, but it certainly is in the prior art. Now, let's compare up here. The first nitrogen is two pounds away, and the other one is six away. If we count here, the first one is three away, and then we can have the next one in appropriate link because of the alkane chain variable. So maybe we should try to correct the distance of the first nitrogen and therefore you can just delete that part and replace it with the other one that brings that in the right distance. So we click OK, and try to search for that. That yielded no results whatsoever. So, this seems to be a much too restrictive way of searching for structures of the family that we're interested in. Okay, let us try after this very narrow searches to throw the not really outwit and then focus in a final combined step just as we are used to doing intervene. So we call this characteristic part of the structure once more. And then we take the liberty of being really, really on specific about the substitute. So we simply say that we will have any heterocycles situated here. And no, that's one thing that we forgot, we will not be that ball. We need to lock the ring here, you see it's both face. Now, so we will not have decorations on that wing itself. So now we're ready. [INAUDIBLE] focusing is to do it in a final set combination step, just as when we used to when we use. We will not combine different sets of molecules that we find, but we will exploit the fact that SciFinder also supports text searches. Since we are interested in antidepressive agents, we will do a text search for the serotonin reuptake inhibitors. And then finally, combine that with our set of molecules, and here we have them, and this is really wide. We got really, nearly half a million different molecules. Before we continue, take a look at this this concept under the sample analysis headline here. Prophetic in Patents, that is largest group of molecules. What are these? These are molecules that have never been prepared. These are the result of the formula. They capture families and molecules. And they're patentable, even though they've never been made. Now we want to limit ourselves to molecules of interest. Stiff molecules that belong in the atomspace. So the first thing we do is limit the molecular weight to 350, and reduce the number of bonds with free rotation to just three. In this way, we got our half million compounds down to just 19,000. Still a lot, but a lot better. So, we want to save this set for later use. We do that up here. And we give it a name. And in the name, we indicate that it is with the answer set from a search on molecular structure. We indicate that first serotonin reuptake inhibitor, any recorded molecules. Why we do that will become clear in a minute. So, what we have found is molecules, but what we want is documents. So now we treat the documents. You can see here that it's possible to insert a filter here. But we don't. And to begin with we get 8,230 documents out of this set. Some of them could be duplicated, so let's see if we can remove some. 148 documents disappear out our set. Not so much but still worthwhile. So now we want to save this set and we call it inhibitor references from molecules indicating that now it is the documents that we are saving in our set. Okay, from here on, it is now time to do the text search. Up here in Explore, we select Research Topic, and here we enter the phrase, we do not have support for text search in the way that we know from. But is pretty clever that passing sentences and concepts. So we will write selective serotonin reuptake inhibitor And here, we have the hits, we get four different sets of hits. The top one is something that we would get from using quotation marks around the concept in. The next one looks like something we would get by the use of the operator same in [INAUDIBLE] or near in PatentScope. And we should have selected the second one. But because of the large number of documents for this example, we would do the not so clever thing of so they think the smallest set, the top set. And then we will get the references that belong to this set. And we get 5,000 of them and in this case, the elimination of duplicates is actually automatic, we don't need to do that in a separate step. Now we are ready to save this answer set and we will do so indicating in the name that this is a text search, Serotonin reuptake inhibitor references from text search, to discriminate it from the similar set that we obtain from the seratonin molecular structure. In the combine step, we want to achieve the following. We want to select from all the hits that we've got, well, with the search on molecular structure, those that speak about the indication, depression, or serotonin uptake inhibitors. And we do that here, Combine Sets and here we see the document sets that we have generated. The one with the text search on top, the newest one and then the one with the molecules on bottom and then we select both of them and then we select Intersect meaning and search. And now we get just four documents. That's pretty easy to get an overview of but still let us demonstrate the refined function that we have. It can refine and select for different properties, like the properties of interest to us here is select the patents. So we click the Refine tab and refine by document type and select Patent and Refine and one of the three documents disappear. If you look a little bit at the part of extract that we can see in this short format, you can see that these patterns are phrased with [INAUDIBLE] formula with a one or a two, etc. And then how these are defined, and embrace quite big families of related molecules. As I said to begin with we have only scratched the surface. There are many more tools and this was pretty quick that we went through the motions in SciFinder. There are many things, especially in the last example that we did, that I wouldn't recommend for you to do at home. We restricted the set that we used in the penultimate step, the text search, we selected the smallest answer set. That goes against the teaching of casting out the net wide before you make set operations and combine exercises. And, that is certainly not recommended. If we actually knew what we were looking for, I believe that our initial structure would be defined in such a way that we will not get half a million hits unless if we take a few shortcuts here and there. And the whole idea of making this intersection where we demanded that the patents we were interested in actually talked about the anti-depressive indication is a bit questionable. Not as part of a grander mapping of the pattern landscape around what you're interested in but certainly as a standalone, it's not recommended. Now this is a video that you might want to look at more than once. But the only way to actually get under the skin of SciFinder is to play with it, so that is my recommendation. Play with it, try it with molecular species that you are interested in and familiar with. And when you've done that for a while and you sort of get the hang of how to use the SciFinder, then do this weeks exercise. [MUSIC]