[MUSIC] Now let's talk about APIs. API stands for Application Programming Interface. First, an interface is a point where two systems meet. Application programming doesn't mean a whole lot to me, but I think the idea is that an API is a specific kind of interface. One that allows user to programmatically access an application application interface or program interface makes just as much sense to me. In my experience, many people have trouble defining what an API is. I find it simplest to explain by way of analogy, driving a car. On the left we have the application, a car engine. It represents decades of combined engineering experience and knowledge. It's complex. On the right, we have a car's API, a steering wheel, pedals, a gear shifter. Now here's the important point, and why APIs are so cool. You don't need to know anything about the car engine in order to drive the car. Now imagine that one day your mechanic sneaks into your garage and replaces a particular gasket. Question, do you think most people can tell that the gasket had been replaced? Now, I bet you, you could replace the entire car engine, and many people wouldn't notice. And that's the beauty of an API. It separates how something should function from how it actually functions, and this is a powerful idea. So REST is a model for web applications, and an API is an interface, or set of functions, for interacting with a program. Combined, RESTful APIs are powerful abstractions that allow you to programatically access and modify web applications without knowing everything about them. Finally, let's talk about how REST and web API's are implemented with HTTP. HTTP, or the HyperText Transfer Protocol, is a networking protocol for communication between computers. The first version was actually developed before REST. REST was actually developed to describe the protocol, and the two were developed along side each other over time. It is a core technology driving the world wide web, We've now introduced three big concepts, so before we go further, let's review. REST is an architectural pattern. It's a description of what a certain kind of program or architecture should look like. An API is a user designer contract. A developer of a system designs and documents an interface and the client can use the interface without knowing the details. And a protocol is a communication standard. HTTP is just one of many different protocols. As a communication protocol, HTTP specifies what a message between two computers should look like. There are two parts. First, messages have headers. Headers contain metadata. For example, a header might say what character and coding scheme to use. Or it might say what browser the request is coming from, called the user agent. It might say what content it is. is it a picture, a video, a document? Second, messages have a body. This is the actual data being exchanged. Remember how a rest constraint was statelessness? The headers plus body tell the receiver of a message everything they need to know. They know how to interpret the data, and they might have some useful metadata. Remember how REST constraint was the ability to cache data? Setting and breaking the cache can be done through headers. Hopefully, you're starting to see how HTTP allows for RESTful applications. So we know what HTTP messages look like. How do they get to their destinations? HTTP specifies your routes, or Uniform Resource Locators. A URL is a reference to a resource. In this slide, we have the parts of a URL. First, we have a scheme. Typically you see HTTP or HTTPS, but it could be FTP or mail to, or something else. This specifies how to connect. Next you have the authority. This contains the domain name, the human readable name for the address. This specifies where to connect. Finally, you have the path query and fragment. These specify what to ask for. Let's look at an example, namely, a URL for a Wikipedia page for Grace Hopper. First, the scheme, http, specifies a secure connection. The authority, en.wikipedia.org, tells the browser where to make the connection, mainly the English language version of Wikipedia. The path, /wiki/Grace_Hopper, specifies which page, which resource to requested from Wikipedia. And finally, the fragment. Awards_and_recognition further specifies where on the page to navigate. We don't see a query in this example. Finally, HTTP specifies request types. There are many different kinds of HTTP request types, but we'll just talk about two very important ones, GET and POST. A GET request fetches a resource. Every time you browse a website, your browser is making GET requests. In other words, your browser asks the server, please fetch me whatever is located at this URL. Clicking a link, for example, initiates a GET request. A post request modifies this specific resource. More technically a post request requests that their server modify a resource. For example, when you tweet something on Twittery the browser will post your message to Twitter servers. This updates the state of the program, mainly some internal list of all your tweets. And your friend performs a GET request an hour later, they will see your tweet in their feed. You can see how combined these two request types allow you to do a lot. You can ask the server to update itself with some new data you've sent along, or you can request the server fetches you some data based on the URL. Let's look at all of these different concepts at play in a single example web application. Here we have a client, a server, and a database. The client makes a GET request, asking for the foo resource with value bar. Remember how, with our RESTful web service example, we had URLs that looked something like customer/9. This is the same idea. The client can just ask for the resource located at URL something, something, something /foo/bar. The server then queries the database. Maybe it looks something like SELECT * FROM foo WHERE name IS bar. The database returns all the data that meets the criteria, and the server can then format and sanitize the data. Or really do whatever it wants to it, before returning a representation to the user. That representation might be in JSON or XML. Hopefully, this slide highlights the intersection between a REST model, an API, and HTTP. Finally, let's look at another example, or RESTful API, that grants you programmatic access to data. For this example I'll be using the Harmonizome, which is a biological knowledge engine being developed by the Mayan lab. First, our browser makes a GET request for the URL on the previous slide, amp.pharm.mssm.edu/Harmonizone/api/1.0. The RESTful API informs the client which possible URLs to follow. Let's choose gene. It's the base URL /gene. Next, the RESTful API returns a list of all the genes in the database. Note that the data is quote, unquote, paginated. Only 100 results are returned, and the API specifies a next link for the next 100 genes. This is a method for returning only some of the data, and it's done for a few reasons. First, the Harmonizome has tens of thousands of genes in its database. It would overwhelm the client to return all of them. And second, selecting, filtering and formatting every single gene, if the user is only interested in a single gene or a small subset of genes, is just a waste of the server's computing resources. Paginated the data in this way allows the user to programmatically query the database and look at all the data, but really if the user wants to do that. Finally, let's look at a single gene, say STAT3. It's the base URL so far, just /STAT3 added at the end. The RESTful API returns all of the relevant data about STAT3. You see a symbol, synonyms, name, description, NCBI link, etc. Remember, this isn't the actual data. It's a JSON representation of the data at a moment in time. We could just as easily return the data in XML, or we could add more information to STAT3. and the next time you made this GET request, you would see more data. So that's it, that's the Harmonizome API. In conclusion, RESTful applications and APIs use HTTP to create powerful systems that are fast, scalable, modifiable, portable, reliable, and easy to understand. Knowing how these systems work is key to accessing and serving big data in the information age. For example, now that you know about the Harmonize API, you could write a Python script to programatically make get requests to download the data and analyze it yourself. Data is moving to the cloud, and knowing how to access and serve it is critical to all kinds of research. [MUSIC]