Now we'll use some of the operations and some of the functions and types that we've discussed so far to build our first program in python. So, remember we're going to go through some pseudo code and I'll show you the pseudo code and show you the Python code that implements that pseudo code. For our first program we want to compute the GC content of a DNA sequence. That is, the percentage of Gs and Cs in a string of letters that's all As, Cs, Gs and Ts. So the first thing we've got to do is read in our DNA sequence from the user. So here, we're just going to assign our DNA sequence to our variables. Our DNA equals and we provide a long string. So I want to count the number of Cs. Now remember there's a method to do that, as we've talked about in another lecture. We can just say DNA.count and that tells the string DNA to use the count method to count how many Cs there are. So dna.count of the letter C. Gives us a count of how many C's are in the variable DNA, which is a string. Now this time, rather than having that returned to us, we want to save that value. So we assign that to a variable. We're going to call that number of C's or just no_c. So that's another variable which is going to capture the value that is returned by this method. So now we want to do the same thing and count the number of G's, so of course we create a variable called no_g, or number of G's. And we assign to that the value that we get by applying the count method only now we count G's in the string DNA. So that's dna.count of G. Now we need to capture the length of the DNA sequence. There's a length function, we talked about that before, we call the length function in our DNA string and it returns the length. In this case we want to remember that, so again store in a variable, we'll call that variable DNA length or DNA_length and set that equal to length of DNA, or len DNA. And now compute the GC percentage so the GC percentage is going to be the number of C's plus the number of G's. So we add those two together with the plus operator, put those in parentheses so we can have that operation occur before the multiplication, multiply by 100, because we're doing the percentage here, and divide that by the length of the DNA. So we just use simple mathematical operations that you that you've seen before. And we capture all of that and we store it in another variable. We'll call that variable gc_percent. So that gives us our GC percentage. By the way, the 0 after 100 is only required in Python 2. In Python 3, you could just say 100 without the 100.0. And now we want, so now, GC percent is a variable that has the value we want. We want to spit that out to the screen, so we would just print it with our print command, print gc_percent and that gives us 53.06%, with lots of extra precision. So, now this is one way to write a program. We just wrote one by typing a series of commands to the interpreter but if we wanted to run that program again, we don't have it. We've just typed it in. We'd have to type it in again. That's not a very effective way to do programming. So of course, most of the time when you do programming, you're going to write your program in a file and then take that file and interpret the entire file or pass that file to the Python interpreter. So here's the same commands. We can store them in a file. We can call that file say gc.py. You can give it any name you want. And we just put exactly those commands that we just typed in interpreter. Put them all in a file, just a plain text file. And the extension .py by the way is a convention that is usually used to indicate that our files contains a python program. It doesn't have to have that but it is really helpful if you use that. And that way you can keep track of all the files that you have that are python programs rather than, say, text files or word documents or presentations of some kind. So now we got this file, we want to execute this file. So we don't to have to type it in again, we want to be able to take this entire file and run it in Python. So we can pass this file to the Python interpreter. We could do that by typing in the name Python itself, that's the Python program, rather than typing Python and just letting it give us back the prompt, we can type in Python and give it the name of our program here, gc.py. And that tells Python take this file and execute what's in it and it will give us the answer, 53.06%. So you might not want to type Python and then your program name every time. So instead of doing that, you can just embed in the file itself a special command or a special line that tells the computer that what this program is, is a Python program. So if you put this special path name as the very first line of your file, which is #!/users/bin/python which is the path to python on my computer. You put that as your first line of your program then that when you try to run this file to the UNIX command line prop the UNIX system itself will look at that first line and say oh this is a python program I'm going to call the python interpreter. So you don't have to type python, you can just type gc.py and that will run the program. Now once you put that special line and it's the first line of your python file, which you will do for all your files from now on. You just type gc.py and it will run the program. And the reason I put ./gcpy was I was telling the UNIX command line to look in the directory I am in right now which is where this file is. One little caveat here is that I've now created a program and, by default, that program is probably not executable so UNIX will complain. Python will complain, but UNIX will complain. So if the file itself is not set in the UNIX operating system to be an executable file then you can't just type its name to the UNIX command prompt. So what you do is you change the settings on your file to make it executable with the command chmod, which is chmod. And you say I'm going to change it and add the executable bit, change the executable bit which is done by setting permissions for everyone. That's a adding executable privileges for everyone so I say Jamad a+x and give it the file name. Now this file is executable and now I can execute it from the command line. And by the way, I put userbin python in my file because that's where the python program is on my system. In your system it's possible the Python program might be sent somewhere else, so the UNIX command line prompt you can type which python and it will tell you where it found it. And here I've shown an example where I typed which Python and you see I got back user bin Python. That's how I knew where it was. So, now, once you change the permissions on the file appropriately, now you can run your program like I just showed you. You can just type gc.py and it will execute the program. Now an important principle of all programming, not just Python programming, is to add comments to your program. So you might think, well, this is a very simple program, I can just look at the code and Python is pretty readable so I don't really need to comment it. That might be true but long experience has taught those of us who do a lot of programming that when you come back to the program days later, weeks later, months or years later, you often don't remember what it was you had in mind. And you certainly wouldn't be able to look at someone else's program and figure out what it does. It helps a lot to use useful variable names that have meaning in English and that way you can sometimes interpret pretty easily just by looking at the code, you can interpret what it's going to do. But it's much better just to put in commas, just type in what the program is about, what each line is doing if you like. So the way we do that is we can surround some lines with these triple quotes in the Python program. And here I have an example where I put triple quote and then I wrote, this is my first Python program. It computes the GC content of a DNA sequence. And then I added three more triple quotes. So in between the triple quotes, everything is simply ignored by the Python interpreter. So that's a comment. And you can type whatever you want in there, it's going to all being ignored until Python gets to that next set of triple quotes. Another way you can add comments and we do this probably a little bit more, for very short comments, you can just put the special hashtag character and then anything from that hashtag to the end of the line is also ignored. So, here I've shown hashtag and then get DNA sequence, that's basically commenting on the nice line which is reading in my DNA sequence. And if you see each of the lines of the program now has a little #comment after it. So I wrote, no_c=dna.count('c') and then I wrote, # count C's in DNA sequence. So that text that's from # to the end of the line, it's all ignored and I can say anything I want. I use that as a comment, so that explains what the what the program is supposed to be doing there. So, comments are extremely useful as a way to document your code and any good programmer will have comments scattered throughout explaining what the lines are doing. Especially if they're a little more complicated.