Next we will talk about how to define modules and packages in Python. Obviously the first question is what are modules? If you exit a Python interpreter and enter it again, the definitions you have made of functions and variables are lost. You may want to prevent this. You may also want to use a handy function that you've written in several programs without copying this definition over and over again in each program. To support this, Python has a way to put functions together in a file, which is called a module. So, modules in Python are simply Python files with the .py extension, which contain definitions of functions, or variables, usually related to a specific theme. Grouping related code into a module makes the code easier to understand and use. We could put all of the functions we defined previously, for example, in our module to process DNA sequence in a file called dnautil.py, for instance. So, let's start writing our dnautil module. At the beginning of the module, a documentation string could be added so, it will describe what the module does. In this DNA module contains a few useful functions for DNA sequence. And now we can add the functions we defined starting with gc(dna) and then has_stop_codon(dna,frame) and so on. Now, let's enter the Python interpreter and try to use our module. And to do this, we need to use the import statement. And the syntax for this is import and the name of the module, dnautil, and notice there is no need to add the .py extension. Chances are that if you didn't put the dnautil.py file in your current directory, Python will give an error message. And why is that? Why didn't Python find your module? When a module is imported, Python first searches for a built-in module with that name. So one that he already knows about it. If it doesn't find it, then it will search for a file with the module name and the extention.py. It first looks in the current directory. The one that you started your Python interpreter from. Then it looks in the directory where Python has been installed. And then, In a path specified by the environment variable called PYTHONPATH. You can set a PYTHONPATH in your shell, and the way to do it actually depends on your system environment you're working. So why we not talk about this? You probably have learned this already if you took the UNIX commands module. Don't worry if you didn't do it. There are ways to specify the search pass in Python directly and we'll see how to do this next. Here is an example where we can use the built in module sys to check the path containing all directories where Python looks for a module. First we need to import the module sys using the statement import. And sys.path is a variable that stores all passes where Python looks for the module. Writing sys.path at prompt, will give us the path's list. You can see it starting with the current directory specified by two single quotes with nothing in between. And then a few other directories, mostly related to the Python's installation. If the sys.path variable list doesn't contain the path to the location where you put your dnautil.py file, then you can add it to it. To do this we could use the append method of the list to add the path to the module. Let's say that in my case, danutil.py is located in a directory called /users/mpertea/courses/python. So let's do the sys.path.append and put in as an argument the directory where I put my dnautil.pi file. If I check the sys.path variable now it indeed contains the path I added. Hopefully you are successful in importing the dnautil module and Python did not return an error. Now you want to use the functions in that module. So let's create a dna string variable and call the function gc on it. It didn't work. Why? What happened? Well, when we called the gc, Python didn't know where to look for it. So we actually have to specify the module where it comes from. And to do this, we need to write dnautil.gc. Now when it called dnautil.gc, Python will know this is the function gc in dnautil and it will spit a percentage of the dna string. You don't have to use a module's name if you use the from import statement to import the functions from the dnautil module. For instance, let's import our dnautil using the from import statement like this, from dnautil import *. And this will tell Python to get all functions, and all their definitions. After you've done this, you can use the gc(dna) to get the gc percentage. So now gc(dna) will give you the 0.53 and so on, a percentage of the DNA string variable. This was much easier, so why don't we always do this? Well, sometimes you have the same function gc defined in several other modules. So you might want to use the module name in front of it, if you really want a specific function from a specific module. You can also use the from import statement to get only a few select functions from the module, but not the rest. In this example let's just only import the functions gc,has_stop_codon. If your module contains other function then you won't be able to use them unless you import those too. Now I will let Stephen tell you about packages. >> Packages are really just modules by another name. They're basically a way to group modules together into a larger collection of modules. And we call that a package. So for example, a module name A.B designates a sub module named B in a package named A. So each package in Python is a directory, and the directory has to have a special file which has this very special name __init__.py. I know this seems very silly. The designers of the Python language created this file name as something they thought no one would ever actually call a file that. And so they don't expect it to be a problem for you to create a file with a special name. So there doesn't have to be anything in this file, in fact in general you don't need to put anything in it so it could be an empty file. But it indicates to Python that directory contains a Python package. So it can be imported the same way a module is imported. So let me give you an example. Suppose you have several modules, dnautil.py, which we talked about at another lecture, which contains useful functions to process DNA sequences. Let's suppose you also have a rnautil.py module which has some other functions about RNA. It may be a protein.py file which contains functions that process protein sequences. You'd like to group them all together in a package that we're going to call bioseq which processes all different types of bio sequences. So here's a possible structure for your package. You have a directory called bioseq. And inside of that, you have this init.py, which remember is __init__.py. And then you have your three packages, dnautil.py, rnautil.py, and proteinutil.py. And you can have more as well, so you might even have some additional submodules under that, or subpackages, called fasta and fast. Which might have, let's say fastautil.py and fastqutil.py, and each of those could be packages too, which means that those directories also contain the __init__.py special file name. So here's our top level package, it's bioseq in this example. And we have the __init__.py file there to indicate that it's a package. And then we have sub-packages for processing, fasta files and fastq files. So how do you load all these, all the functions from all these modules inside of a package? So to use the module dnautil, we can import it in two different ways. We could use the import command, and we could say import bioseq.dnautil, where bioseq is the name of the package and dnautil is the name of the module inside of that. And then to use a function like gc, we'd have to say bioseq.dnautil.gc(dna). So our function name, instead of being just gc, is now going to be this much longer string that contains the package and the module where gc is defined. Or we can say from bioseq import dnautil which means go to the bioseq package and import just the dnautil module from that. So, now we've imported the dnautil module and to use the gc function now we would just say dnautil.gc(dna). So we don't have to import, don't have to use the name of the package anymore. So to import a specific function from a submodule in a subpackage we'd use the following syntax, from bioseq.fasta.fastautil. So bioseq is our package, and then fasta and fastautil are our modules inside of that. Import fastaseqread. Of course fastaseqread is a function that reads fasta sequences. That would allow us to call that function. So you might wonder why we're telling you about packages and modules, which seem rather complicated and involved. Also some interactions with the Unix file system. In general you don't really have to worry about these, if you don't want to. If you're just writing simple functions, you can just write them and use them. But if you want to use the Biopython package, which we'll describe in another lecture, which has many, many very useful functions. Then you have to know about packages and modules because you have to import them from the bio python package.