Welcome to our lecture on Postgres and Python. Up till now we've been typing SQL commands through pgsql. It's quite easy in Postgres to, in effect, simulate what pgsql is doing, because pgsql is just a client. The Postgres server is somewhere else on the network. It's got an IP address and an ID and password. We log into it, and then we send commands to it. pgsqlL is just one piece of software, one of many pieces of software, that's capable of taking SQL commands and sending them. Another piece of software that can do it is actually Python. And so and now we are going to and Python can handle text so well. So we are going to do a series of things where we actually sort of use text, read text with Python, maybe talk to an API, parse that, and then put it into a database. And we can actually clean up data. We've done data cleaning in SQL, but when you get to Python, then you have a lot more power than you do in SQL. So the first thing, let me zoom this up a little bit. Now the first thing to know about Python is that you have to have this psycopg2. That's a Python Connection to Postgres or something like that. And then with that you're going to then make a connection. This connection is basically logging in to the Postgres server. That can fail, you might have a bad ID or password, or have the host name wrong. When we do this on my online examples, we'll have the file hidden.py so you don't have to actually have the secrets right in that file. Now you can send commands through this connection, but we tend to want to maybe have in a sense more than one client at a time. So we use a thing called a cursor. A cursor is a lot like a file handle, or a socket if you've programmed network sockets. And so you actually say to the connection, you say, hey, connection, give me a cursor. I would like to start sending SQL commands. And then you start sending SQL commands by typing. The cursor is an object, then you call the execute method in that object and you start sending it strings. And literally, they're just strings. It doesn't have to be constants, they can be strings that you concatenate together to make more things and then you can put substitutable parameters, which we'll quickly see. And so here I have a get me a cursor, drop a table, here's some more cursor execute. Now you can use the cursor over and over again. But now what we're doing is we're actually sending a command, and then after that command is done then we get back in the cursor what's called a record set. And then we can iterate, loop through, by saying give me the next row, give me the next row. So think of how you type a SELECT command and then it brings you a bunch of rows. Well, in Python, you do a select command and you have a handle through which you can see the rows because there could be one row or thousands and thousands of rows. So you tend to have to write a loop unless you know you're getting exactly one. So fetchone says get me the next row, whatever that is. Now a row is a tuple, and if there are no rows, you get none back. And so this is very simple. You can take a look at that sample code. And the first really big sample we're going to do is we're going to read through the text of a book. And this is from the Gutenberg Project. Funny thing, it's called PG. Postgres is PG, but the PG 19337 is one of their free and open text books. This happens to be Charles Dickens Christmas Carol, I think. And we're going to read that as a text file. And then we're going to load that into a database and then build a full text index and then play with that full text index. And I have instructions this loadbook.py. I'm going to do a walkthrough for you in that. There's some utility code that I give you in myutils, there is a hidden.py that puts the secrets in. And then you're going to grab the text by downloading that with a wget, then run the code. The code simply, what the code does is it looks through the file and takes paragraphs, like a paragraph here, there's this paragraph here. And it takes multiple-line paragraphs and strings them into one long text line and then throws away extra blank stuff. And then puts that in a database and puts it in a database that basically has a single field called body, which is a TEXT field. And so this ends up with effectively one row per paragraph in this book, and then we play with it. As the thing is running, as this loadbook is running, you can watch using a Postgres psql in another window. And you can watch it. It actually commits every 50 records, so you'll see that once in a while. And then we're going to play with it and we'll just make a ts_query. And this is really just a different way of getting and filling up a significantly larger database with some interesting information. And then we play with queries and I'll do this all in a nice little runthrough. So, in the next lecture, we'll talk a little bit about probably the most complex thing in this particular week. And that is a application that I wrote that loads a bunch of email data from an online source. [MUSIC]