Hello and welcome to Python for Everyone. I'm going to do some coding. It's related to the dictionaries chapter, Chapter 9, and we're going to do some word count that's basically right out of the slides. But I'm going to just write the code in front of you rather than have you look at it in the book. So what we're going to do is, I've got my text editor up here and let me start by making a new folder. New folder for my chapter 9 exercise, and then I'm going to go and make an untitled file. That was from the previous one. And I'll do what I always do. Print 'Hello' and save it. And save it here into exercise 09, and ex_09.py. So now I have a folder that's in my py4e folder and that happens to be in my desktop. py4e is my folder on my desktop and now l have all of these subfolders cd ex_08. ls is dir on Windows, ls, oops I gotta go up one, ex_09 ls. So I've got that file right there. Now I'm going to want to read some files. And so I'm going to bring some files down. A couple of files. Python for everybody code3/intro.txt, so I've got this URL. And I'm going to save it, save page as, and it's really important that I save it in the same folder as I'm going to write my code, just so that when I open this file it knows where it's at. So I've saved that one and I'm going to also take this clown text, I'll use this to make my life simple, so I have a real short thing that I can show you how it works. And so now if I go back to my terminal. I see I've got ex_09 Python, intro.txt, and clown.txt, okay? So let's go back to my text editor and get started. I will prompt for the file name, input ('EnterFile') colon, space. Now, I'm going to do something. If the length of the fname that I just read is less than 1, I'm going to say fname equals clown.txt. I do this so that I can just hit Enter and it defaults to clown.txt. If I want to give it a different name I can. So if I just hit Enter at this prompt then this will give me a string that's zero length. So if it's less than 1 I'll just assume that. So let me open that. hand equals open(fname). And let's read through it for lin in hand. We'll strip it, lin = lin.rstrip to take the whitespace off the right-hand side. And now we're going to say print lin. Again, I'm not just doing this. I really, when I write code, I just saved it. When I write code, I do this kind of stuff all the time just for my own sanity checking. And so now I'm going to run python3 ex_09.py just to test that. I want to hit Enter now, and it's going to assume hopefully clown.txt if it all goes well, and yep, it read one line. Okay? So that part's working. I'll just leave that print statement in. The next thing I want to do is kind of a classic thing where we're going to go read a bunch of lines and then go horizontally across those lines and words. So I'm going to split that wds = lin.split and print(wds) so I'll print that and I'm going to save it and test it. I really love to test things over and over there's the actual line. This file clown.txt only has one line and it breaks it into words. And so I have those words. Let's just run it again with intro.txt. So this will have a lot of lines, line, line, line, line, line, lots of lines. Every line has it prints out the line and then prints out the words that we split it into, okay? So now, one of the things that I do here is now I sort of can believe everything from here up. Like oh, it's going to open the file, it's going to read through the lines and then split them into words. And so then I'll just kind of behind it, I'll say, okay, I'll just comment that out. Now, I need another for loop for w in wds. Now, wds is a python list and has some number of words in it. 0 or 12 or whatever was on the line. And now I'm going to print out the word, okay? And so, now it will go through that horizontally. And I'll just do clown.txt so that you see. I'm not printing the line out. That's the words that have been split from the line. And now we got this loop. Now, one of the things that's interesting, is just to make sure that you're going through all the words. And I like a print statement here, to know that w is going to successively take on literally all the words of this file. So if I comment this print statement out. And I run it again, clown.txt. That for loop starting from here is every word in that file which happens to be only one line. But now if I do the same thing for intro.txt. It's just going to go through the words. And in a sense, by nesting these two loops, we're going to hit all the lines. And that's a lot of stuff. But it hit all of the lines, all the words, and away we go. Okay, so here's where a dictionary comes in. I'm going to make a variable called di for dictionary and I'm going to say give me a dictionary. Now dict is not something you can choose, that's saying, make that's defining the type of dictionary. di is a variable that I chose. Okay. So the key thing to this dictionary is we're going to make a counter, and we're going to use w, the word absorb, elegant, whatever, and we're going to use that as the index. So, the simple thing to do is to say, if w is in di, then we can say w, I mean the dictionary sub the word, which is our key in the key-value store of the dictionary, is equal to the value that we had before in that area, di sub w plus 1. And if it's not in there else di[w] = 1. And I'm going to print. print('**NEW**'). So every time we see a new word it's going to say new, and I'm going to also then print w and the current value of the counter for w as it's going through. Now notice how far in I'm indented. This is all part of this inner loop. So this is the loop that's going to run every single word, okay? And I'm going to run this first with clown, so it runs slowly. We saw the was new and the count is 1, clown is new, clown is 1, ran is new, the count is 1. After is new, count is 1. Now we saw the again but now we made the count be 2. Let's print here. I'll say Existing. So you can kind of see it. Now in the print, I'm printing this, let's make it even a little more verbose. Print w and then I will make it so it prints the word before and the count after and then whether it's existing or new. So we're going put a lot of print statements in. Print statements are cheap, okay? So now we see the word the, it's the first time we see it, and we set it to 1. We see the clown, first time we see it, we set it to 1. We see ran, new, 1. Later on we see the, it's already in, so existing means it was already in the dictionary, w as a key was already in the dictionary, okay? And so that's why we added 1 to it. So the old value was 1 and then we added di sub the = di sub the + 1. w is the string, the, t-h-e, that's what that string is. Okay? And so we've made it all the way through. And you'll see the, in this one line, occurred ultimately 7 times. So now I want to print out the contents of this dictionary at the very end of both loops. So I've got to deindent twice. And so that will give us the counts. And so this is what we get when it's all said and done. the happens 7 times, but it just worked its way through, okay? So you got that. Now this is a pretty verbose way of doing this, but I did it sort of the slow way to show that there are two situations. If it's already there, you increment it, and if it's not there, you set it to 1, effectively inserting it. Right, so you insert it and set it to 1 with this di sub the = 1, okay? But let's get a little less verbose here, get rid of some of these print statements because we kind of covered all that. Get rid of this line and go back to printing w and di(w) at the end. We'll leave that one in. So what I want to do is I want to look at this bit of code right here. This, if w in di else. We do this so much with dictionaries that there is an easy mechanism to do this, that combines these four lines into a single kind of contraction. And so I'm going to do this. I'm going to print, let's put two stars out, then the word, and di.get of the word, comma, negative 99. Okay, and so this di.get of the word is the important part. The way it is this is a dictionary .get says, and its first parameter is the key to look up, which is word like the, or fell, or clown, or whatever, and 99 is the default value that we get if the key doesn't exist. So this is in effect an if-then-else, right? This little di.get(w, -99) is if it's in there do one thing, if it's not in there, do something else, okay? So let me show you how this works, and you'll see that the 99 will happen when Okay, so the first time we see the, get returns 99 right, so let's move it over here. The first time we see the, the is not in the dictionary. So this di.get of the word the in the dictionary gives us back the -99, okay? And this still is working, and so the is 1, clown is whatever, but away we go. Okay? Let's do it this way. Let me count this out. Let me count this one out and run it again. So it's a little clearer what's going on. Okay, so the first time we see the, the is not in the dictionary. The first time we see clown and we know it's -99. But here, we asked for it and the is 1 because we've seen it before. And so that's just this get mechanism allows us to get the new value. Get a value out if the key exists and specify a default if it's not there. So I'm going to go oldcount equals di.get(w,0). So instead of using 99 here, I'm going to just get rid of all this, is what I'm saying is, look up in this dictionary, get is a function, that's part of all dictionaries. Look up using the key w, which is the, and if don't get it, give me back 0. And so I'm going to say print(w,'old', oldcount). And now what I can say whatever the old count is. It's either the value that was in there or 0. And now I can say, newcount = oldcount. And now, let's see, newcount. And I can say dictionary sub word is equal to newcount. So I'm going to get rid of this if-then-else then, this is basically saying look up the old count that we have. If you don't find one use a 0. We'll print that out and then I'm going to say afterwards. And print the new count. And so, so we'll print the old count get rid of some of these blanks. Print the old count. And you can see the old count with the, because the doesn't exist, was 0, the new one's 1. Clown's old is 0, new is 1. Clowns old, ran old 0, but now we get to the. Its old count was 1 and now its new count is 2. Okay? So by using this get and saying if we don't find it, we'll assume the count is 0. That makes a lot of sense, right? If not there, the count is zero. If the key is not there, the count is zero, okay? So that's what this line does. Get the value under the key, associated with the key, or give me 0 back. And then I can take that old number and just add 1 to it and then stick it back in. Now this is ultimately not how we tend to do it, okay? We tend to blend this all into one big long statement, di sub w = this part + 1. Okay, so that says get the old value from this key or 0 and add 1 to it, because that really combines all of these lines into a single line, okay? So I'm going to delete them now. And now we've combined this all into one, what effectively is an idiom. retrieve, create, update, counter, all in one line. I'll still print out, in this case I'll just say, di[w] and then we'll see the counter, okay? And so now I'll run this. We have a new, but now we see it the second time, it's 2. And so we see car the first time, we see that the second time the third time, we see car the second time, and away we go. Okay? And so that's pretty straightforward. And so it really kind of, got a typo there. So let's just get rid of that and run it with the clown stuff. And we get the right data there. And let's run it with intro.txt. And there we go, okay? And so, it's tearing out a bunch of words and giving us a dictionary. So, that was a lot of work to get to this line 16 that has the dictionary in it. Now, we want to find the the most common word. And so we're going to loop through this dictionary, and part of it is like, once we've printed this dictionary out and we verified that it's right, don't worry too much about the code up here, right? Matter of fact, I can take out some of these print statements and now we can kind of trust all this. And so now we're going to work on this. Okay, now we want to find the most common word. Now this is like a maximum loop. So if you recall, we have a whole set of key-value pairs. Communicate goes is to 2. Skills is 3, so we have these key-value pairs. Now we're going to loop through and look for the maximum. Now, in a dictionary, we can loop through the key-value pairs with the following syntax. for, I would call these variables k and v, for key and value, in the dictionary's name .items. And items is a method inside of all dictionaries that says give me the key-value pairs, and we need two iteration variables, so this is like an assignment statement for k and v, k and v take on the successive values for the key and the value, okay? So if I just now print k, v. And I'll take this print statement out and then run the code. Oops, what did I forgot? Oh, I fell back into my Python 2 days. I need parentheses for my print. So and there is clown, and it just prints it out. And it's kind of the same thing except it's pretty, where we're printing each one on a line, okay? So the v is the value. So we're looking for the largest value. So, the thing is we know that the values are always numbers that are greater than 1. So I'm going to do kind of of a quickie maximum loop. Largest = -1. Now in previous times we have seen that this is a bad assumption, but because we know these are counters that are always positive, it turns out that this is not a bad idea. And so I can say if the value is greater than the largest we've seen so far, largest = the value. Okay, and when that loop is all done, we can print the largest. Okay? And so this is just a max loop and we're using this value. That's the number, the value's the second thing. Oops. Can't type Python. Oh, it's a typo. I'm not using value, I'm using v, so largest equals v, let's try again. Okay, so we're all done with 7. So these were the things that we were looking for and it was looking for the maximum and it just dutifully found 7 was the largest. But we also want to know what the word is. And so what we can say here, is we can say theword is None. Meaning it's just like we don't know what the word is. And then whenever we catch this new largest number, we say theword = w. So I like to think of this as capture/remember the word that was largest, right? That's what I'm doing. Remem, e-m, Remember, all right, r-e-m-b-e-r, there we go. So this trick here is not only knowing what the largest number was but the word that was associated with the largest number. So now I can print out at the end, theword and thelargest, and that's the count. Okay, and so now we know that, Oops, did we make a mistake here? Okay, that does not look good because it says car and 7. If v is greater than the largest, oh, it's not w. I used a really bad variable. See, that's the whole value there. There we go. It's k, which is the key. Key. I was going to say that was quite the bug. See what happened there, I had this as w, and it just happened to be, it was the last word on the file. Car, the last word on the file, because I used the wrong variable, right? Little mistakes, little mistakes. The and 7. Okay, so let's get rid of this print statement because we kind of know what's going on here. And away we go, and this should now work if we run it. I need to get rid of the word Done here. There we go, the 7. Now, the cool thing about this is this code runs just as easily with one line of code or the intro of the book, intro.txt. And not surprisingly, the is still the most common word in the introduction.txt. I seem to like that word, and it's 226 times. Okay, and so that is the basic pattern of reading some, this is just a word loop. Now sometimes there would be some checking to see if the line is the one you're interested in, maybe tearing apart the line. But at the end of the day, this idiom of starting a dictionary. Now, it's a common problem to know where to start the dictionary. You want to accumulate the numbers for the whole file. So you don't want to put it in between line 6 and line 7, okay? So I hope that particular thing helps a little bit, helps you understand dictionaries.