So, we just took a look at the transport protocol, which the first layer, the lowest end-to-end layer in the TCPIP stack. So, we wrote a Python program and in that Python program, we made a connection with a socket and then connect it to a particular port on a far away computer. Now, we're going to actually start sending data back and forth. So, we made the connection. So, this moves us from the transport layer up to the application layer. The application layer means there's something different when you're talking to a mail server than when you're talking to a web server. There are rules that describe how we talk to them. There are the rules of the road. So, on a telephone call, at least in Western cultures when you pick the phone up and it rings, you say hello. The person who picks the phone up is supposed to say hello first. You may not notice this, but that's what it is. The person who picks phone says hello and then the second person says hello, and then hopefully you start talking. Sometimes if things don't work well, you don't hear it. You'd like, "Are you there?" It gets confusing, if the phones or especially cell phones, aren't reliable. So, this is what we say into phones to get our conversation started is like the Application Protocol. The protocol that we're going to play with in this segment is what's called the Hypertext Transport Protocol or the Hypertext Transfer Protocol. It's the dominant application layer on the internet. It really was invented to retrieve webpages. At the moment of its inception, it wasn't really thought of as like the greatest protocol ever. But it has evolved into an amazing protocol. What happened was, it was so simple that we could just layer new ideas on top of it. So, we started with this really basic simple Application Protocol and away we go. So, it's a set of rules that allow browsers to retrieve documents from the web. So, if you wanted to go write a browser, you could, you would just have to go read the specifications for what it is that the servers are going to feed to you. Or if you're going to write a server, you want it to talk to browsers, you'd also read that specification for HTTP. So again, it's just a set of rules about just so we know what we're going to do first, know what the syntax we expect it produce and consume, and make it so different vendors can work together. It's just a form of standards. So, one of the things that HTTP standardize, which was really cool, was this protocol of Uniform Resource Locators or URLs. We type them so much, that we just think of them as like," Oh, you type this thing in to get this thing on your browser." But they actually contain, inside them, some information. http:// says use the HTTP protocol, www.dr-chuck.com says go to this host and then /page1.htm, says go get this document. That's many years ago. In the 90s, you had to know all these things separately, but then we just concatenated them all together and that became the Uniform Resource Locator or "Hey, type this URL into your browser". So, every time the user clicks on something, and you want it to get a different documents. So, you have a document's got some links in it, the HTML has this thing called href value. You click on it and then you're telling your browser, get me a different page yy clicking on it. That's the hypertext bit is that in any document, there are links that go to other documents. These links are the magic of the web. There were ways to access data from servers before, but the notion of the document that you have has links to other documents is a powerful notion. We take it for granted now. But it, when it first came out in the mid 1990s, '93, '94, '95, it's like, "Well this is better than things we've been doing." I mean otherwise, we learn these weird commands and did this stuff. So, it sends what's called a GET request, to GET the document and then retrieve the document, and then parses the document and then displays it for you. So, this is a little bit of a diagram of how that works. So, you're sitting there, you're looking at a web page and you click on a link, and I made it the blue links. So you click on it, says second page. The browser is a piece of software running on your computer. It intercepts that click and says, "Oh, you've clicked on something." It looks at what's in the HTML of the page that you're coming from, to say what web server to connect to, what port to connect to on that web server, and then what document to retrieve. So, your browser then makes a socket connection to port 80 and sends a request called the GET request and it sends that get request to port 80. Then, it goes in that Web Server, and the Web Server parses that request and figures out what document you're looking for. It might run a little bit software, but when it's all said and done, it produces on that same socket, a response. It sends that response back and the response back is in a form of HTML, the HyperText Markup Language, which is really tags, inside less than and greater than pairs. That h1 says that it's a TetR one, P says it's the start of a paragraph, and then the a tag says that this is an anchor. So, it's supposed to be clickable text on that next page. Then, that comes back and your browser reads that, and then makes the page show up in it. So, it reads the HTML, parses it and there's a bunch of rules about where you add blank lines and all these other things, so that it looks the way that you want. So, that is called the Request Response Cycle. It has to do with when you click, where it goes to the server, gets data back and then shows it to you. You basically see click new page, but there's a lot going on behind the scenes, when that happens. All of the rules of exactly what was sent, exactly how it was sent, how those strings are put together is, there's a standard for it, and there are a whole series of standards. Thankfully, they're free and open and available for you to read. While they're long and complex, you can look at them. There was a group that was formed many years ago, to start building these standards. They number each one of these RFCs, they're called Request for Comments. That's a bit of a tongue in cheek suggestion that even though there is an RFC that guides how your browser works with, millions and billions of browsers work with hundreds of hundreds of thousands or millions of servers, and they're pretty solid ideas that they could always be room for improvement. There can always be room for improvement. That's what the Request for Comments means is that, no matter how perfect we think we've got these engineering standards for the Internet, they could always be improved. So, if you took a long enough time and read long enough, you would find this one called RFC 2616 which tells you something about the HTTP protocol. So you're writing a browser, you're going to read the HTML protocol, you got hundreds of pages to read, It'll take you a while right you're just download free browser, then make your own. But let's hypothetically, think you're going to do this. You'll be reading through this. You'd be paging through, and you get to this section, you like, "Oh, this is what the syntax of a request from the client to the server includes, where the first line of that message, that method do apply the resource, identifier the resource on the protocol version in use." Then, we look and we see," Oh, here's a sample of one of those things." GET, G-E-T, capital letters with the space, and then a URL, Uniform Resource Locator, and then a protocol, and so we connect. Then, this is the line that we send. That's a requesting of a document. So, it turns out that if you have the program Telnet, and Mackintosh people have this, and Linux people have this, and Windows people can install it. Go find how to make Telnet work on Windows. What you'd give Telnet, Telnet is a, it's like a prehistoric piece of software. The reason they don't have it on Windows is because they think it's probably a security hole and they might be right, but they took it off. It is prehistoric thing because it's a way to connect to any server, any port on any server in an insecure manner, and send stuff to it. So, what you would type on your computer is telnet, and then the host, and then the port. By picking this port, I'm saying, "I want to connect to the web server." It connects up. Now, some web servers are impatient because they expect to talk to browser. So, if you take too long to type this and let's say, "You took too long to type. You're just a human, you're cheating." But if you type this fast enough, it might help to cut and paste it. You type this exact html command that is exactly the syntax, and then you hit an enter here, you hit just an enter right there. Then, what'll happen is that those two lines are enough to convince that web server to send you a page back and it will send you back two chunks of information. It will send you back the headers. This is metadata. Metadata about the file that you're about to get including what kind of file it is. It says, "This is just a x and a text/html file." Then, a blank line. Blank line splits between the headers and the content, and then the content of the file, and then the connection is closed. So, the connection close is not part of the text. That's just says it got closed. So, this then is that page that is shown with some stuff on there, and some more links, etc., etc., etc. So, this is the request response cycle. Except normally, what's happening is this is a browser making a socket connection, and then sending a get, and then getting headers back, and then getting the body back, and then making a pretty page out of that body. So, this is how real people hack into real computers, is they actually make connections and they send stuff on those connections. There's this famous scene in matrix two I think where Trinity is hacking in to the back of the power grid. Most of the security movies up to that point postulated that security people when they break in would actually breaking with these really cool user interfaces. But it turns out in the real world, they usually have really lousy user interface. It's like the command line that I keep trying to tell you to use in this class. So, this actually is an interesting scene, you can go to this URL, and take a look at this scene. It actually is written using actual security cracking software, and it was the first of it's kind to actually create in a movie how people really come in the back door of computers and do stuff. So, it's just an interesting thing. I'm trying to show you how to become an expert in all this stuff and all this sneaky clever highly sophisticated stuff often has very simple user interfaces. So, if we're going to do that same thing, which is make a connection to a port, send a get request, and then get some data back, we can then do this in Python. So, we started with those first three lines, import the socket, connect the socket. So, this socket first that when you do the socket, it's like this porthole that lets you out. It's like a doorway out of your computer, but the doorways not open and the doorways not connected to it yet. That's like a matrix thing too, there's a doorway, but what's the doorway connected to? There's a couple of matrix scenes that come to mind all of a sudden. Okay. Well, whatever. That's what this does, makes the doorway, but there is nothing connected. Then, the connect basically extends out of your computer. This could fail if the server doesn't exist. So, it goes and finds the server, connects to port 80, and establishes the socket. When this line is done, what we have is we have a socket and it's connected to a server. You do know that the server's there and you know that their software on the other end of it, otherwise the connect will fail. But if the connect works, you're talking, but you haven't sent any data. Now, you can call methods on the socket object. Now, there's been connected like send and receive to send data across this or receive data from it. Now, part of the application protocol is, what do you do first? You send or receive. Now, it turns out with HTTP, the server does a receive first and you do a send first. That's the rule. So, the first thing you do is you make a request and this is just a string. Now, we have to prepare it for sending. I'll talk in the next section about how this encode works. Prepare it for sending and then we send it. You'll notice there's two new lines at the end of it. Enter enter was what you did when you were in Telnet, and it was the Get blah, blah, blah, blah, blah, enter, enter, and then we have to prepare it. We send that, and so that means that you sent something to the server, and the server receives it. It goes, and read some files, and does some stuff, and then it's going to start sending data back. You can use a while loop now and receive is a method in the socket object once you've sent it, and it might take a couple of sends to get all the data. So, we're going to just print this stuff out onto our screen, so we're going to receive up to 512 characters. If we get no data, that means end of file or end of transmission. So, we break out. Then, if we did get data, we decode it, and we'll talk in a second about that, that's taking data from the outside world and interpreting what it means internally for us. So, we're going to decode it, and so this loop is going to run a bunch of times until it hits enter file, and then we're going to close the socket, which tears all this stuff down because this actually takes up resources in your computer and the far ends computer as well. So, mysock.close closes that and that's it. So, that basically is request response cycle in Python and it's only like 10 lines of code. So, that's really impressive that Python's capable of doing that. So, what we'll get for the output of this is we will get the same stuff we got from Telnet. It's going to be this loop that reads this stuff, and decodes it, and prints it, and it will be header, header, header, header, header, header, how many are metadata. That's the metadata, then a blank line, and then the text. So, it's the exact same thing and those python commands did that same, make a connection to port 80, send a get request, send a blank line, wait, and read data, and then print that out onto our screen, and that's what we would see. So next, I'm going to talk to you about that encode and decode bit because it's important.