[MUSIC] So what happens when you build the next Instagram, and suddenly you have a mobile application connected to a Cloud servers that you've built that's getting hundreds of thousands of users a day, or millions of users a day? Is your application going to be able to handle all of that traffic? Are your Cloud servers, is going to be able to scale up to support that? So this is a key question when you're building Cloud servers, is understanding how well they scale or grow in response to their load. Now there's kind of two ways that we can go about scaling. One way, is we can have requests coming in to our server, and if we have 10 requests we may not need a very big server. But the, if we go up to 1,000 requests, we could try to grow and put ourselves onto a bigger server that has more horsepower in it. And then if we go up to 10,000 requests, we could try to grow to an even bigger server. And this idea of continually scaling up and getting bigger servers, is called vertical scalability. But, there's a problem with this. And that is, at some point we're going to reach a traffic load that just, there isn't going to be a machine big enough to support it. It doesn't matter if you're Google or if you're someone smaller. At some point it's not going to make sense to run all of this load on one single computer. You can imagine that there's no way on Earth that Google could run all of their search operations on a single machine. Instead, we're always going to have the situation where at some point, we're going to have to move our applications. Split it and distribute it across multiple machines. And so, that's what we're trying to shoot for when we're building an application. We'd like to have it so that when we have a small amount of load we can have one machine. When we grow and get more load, we can add another machine. And then when we grow yet again, we can continue adding machines. And we can continue adding machines as much as we want to support this request load. And so, in this scenario what we're doing is horizontal scaling. And this is really what we're looking for when we are building Cloud service, is we want to build services that can be easily be horizontally scaled. And what that means is, is that if new load shows up, we can easily just go and turn on another machine. So, if we need more capacity, we just add a new machine. Now, getting horizontal scaling right is not necessarily easy. You have to do a bunch of things in your application. And optimize it to make sure that you can just turn on another machine. And often, we'll end up in situations, where inefficiencies in our code prevent us from actually taking advantage of the full benefit of a new machine coming online. So, if we think about this as the computing power available to us and how many requests we can process, so measured in requests per second, let's say. This is how we're going to measure our computing power. And this is the number of servers that we've allocated to our application. What we would expect is, is that as we allocate servers, we would automatically be able to have more computing power, and thus process more requests per second. But, the reality is, is that this computing power is not directly related to how many requests per second. We can process because of inefficiencies in the way that we've does, designed our application. So we may have something that looks like this, and in reality our application may scale something like this. So this may be due to, you know, architectural design choices that we've made. We may not be able to just keep adding machines easily, and have our number of requests, or clients that we can service automatically scale up. So, we have to realize that whenever we go and build an application, the design decisions that we're making are going to impact us at some point and make it so that there's a limit to how far we can scale because of this inefficiency between what we really should be able to do and what we actually can do. So this inefficiency here is going to bottleneck us and hurt us at some point when we scale up across multiple machines. One of the big considerations that we have to deal with when we're building these applications, is trying to make our application as stateless as possible. And the servlets or the controllers that we're writing. We want to make them as easy to just add additional machines as possible. Now what stateless means, is that we're going to remember some information about the conversation with the client. And we're going to have things like caches, or log in information or other things on the server side that are going to be remembered as we begin talking with clients. So that is a stateful application, if we're remembering things. If we have a server that's not remembering anything about the clients it's talking to, it's stateless. So we can send, when we have a stateless application we can send requests for a stateless application to any server. So, we can just turn on a new server and begin sending requests to it and we can essentially add capacity to our system simply by turning on new machines, if we have a stateless application. If we have a stateful application, when we turn on new machines, we're going to have to think about if we want to move some of our existing clients to those other machines. Or we have some important state that needs to be migrated to them. How do those new machines get that state? How do they get access to it? How do we decide which clients to move to those machines and how do we migrate their state? So, it becomes much more complicated when we have to think about turning on a machine and then migrating or bootstrapping its state in some way to match what's already happened on all of the other machines. So, statelessness is a property that we try to achieve in our application. We try to have as much of the state as possible not stored in our application so that we can just turn new machines on. But that's not always the case and not always easy to do. We'd like our controllers to be able to process any request. And not have to know about things that have gone on, on other machines somewhere else. So, these are some of the considerations and what we're trying to do when we're building highly scalable applications