[MUSIC] One of the benefits of horizontal scaling in the Cloud, is that we can take advantage of the elasticity built into the Cloud to help us reduce costs and also be able to respond to fluctuations in demand in our application, automatically. So, what do I mean by this? Well, if we are horizontally scaling our application, and we are adding machines as we need to, to handle the load. Think about what would be required if this was a application that you were running in your own data center, or you were buying computers to support this application. If you were running on one machine and you realized that you needed another machine, you would have to go and physically buy a new machine. Set it up in your data center or where ever you had that machine co-located, and then support the infrastructure for that machine. So, what that means is. One, there's going to be a limited responsiveness that you can provide, when you need new capacity. You can't just go and turn capacity very quickly, because you have to physically go buy a machine and set it up. Similarly, if you go buy a machine and set it up, you really expect all that machine's capacity to get used. Otherwise, it's very cost inefficient for you, if you see a spike in traffic, you run off and buy a machine, and then that spike goes away, and then suddenly you're stuck with this machine that you're having to pay for the power for, and you've paid for the physical infrastructure for. So, in the Cloud, however, we're buying machines, typically on an hourly basis, or, some other billing. And we can allocate new machines very quickly. This is the elasticity property, is we can allocate new machines as, as quickly as we need to, or there's some limit because we have to wait for them to start up, but we can usually allocate them pretty quickly. And then, when we're done with them, when we can release them. And so this is the property of elasticity that we love about the Cloud, because we don't have to go and physically buy a machine and get it set up. We don't have to try to plan and forecast way into the future in order to make sure that we've set up enough physical infrastructure. But, what we do have to do is figure out. When should we go and add a new machine? When do we go and set up this N plus 1 machine over here, to add additional load capacity to our a, a, application in our Cloud service? And even though we can set up new capacity very quickly, and we can tear down new capacity quickly. We need someway of monitoring these machines. Now, we could have somebody sitting in front of a console all day long, and watching the load on each of these machines and manually deciding and pressing a button to start things up. But that wouldn't be very efficient and I certainly, don't want to sit there and do that job for the rest of my life. But, if for example, we can know what are servers or, their behavior's expected to be. If we can measure properties of them such as their response time to a request, or the load on their CPU or their disc. We can use that information to figure out, when we need to go, and start a new machine. So we can have some intelligence that decides, when to bring new capacity online. And then that intelligence can automatically interact with our Cloud provider, to add machines as needed, or when machines are no longer needed, automatically tell the cloud provider what to turn off. So that we're only being charged for what we actually need at any given time. And this id i, idea of having some intelligence built in, that can automatically add capacity, and remove capacity, is called auto-scaling. So this is what allows us to run more cost effectively in the Cloud, and it also is what allows us to have applications that can automatically grow to make demand. Now one of the things we have to be, you know, cognizant of. Is it in a cloud environment. There is a limit to how fast we can turn machines on. It does still take time for machines to boot up, and to for our application to launch and initialize and do whatever else it needs. So, we have to build intelligence that can either forecast well enough to predict what the load is going to do. In the next, you know, few minutes or hours in order to turn machines on quick enough so that they will be ready when our load grows to the rate that we, is going to overflow a particular machine or our current capacity. So we have to think about, what is our current load? How fast can we add new nodes? And then using that we want to predict, what should be the capacity that we're going to need to have in the future. Now all of these things are typically, you know, server specific, or application specific things that have to be analyzed. But one thing to consider when you're building your application and you're going to have horizontal scalability. Just thinking about what are the rules going to be for when I should bring new machines online, or when I should take my existing machines offline. And that's what's going to decide your auto-scaling policies that will effect how autonomous your service is and how well it can automatically grow in response to traffic, as well as shrink your bill in response to lessening traffic. The other thing that's important with auto-scaling is that, if for some reason this machine dies, so if there's a physical failure or your application crashes. The intelligence can automatically see that one of your instances, one of your virtual machines or whatever unit you're using, maybe a container if you're using something like Docker has died and automatically go and allocate a replacement. Now, you can see yet again this is an example where statelessness is an important property. If we don't have to worry about which machine died, if we don't have to worry about the load on a specific machine, it makes it easier for us to just add machines and think of things in aggregate. We can just think about, how many machines do we have? How much load is on them? Do we need more, or do we need fewer machines? If we have specific machines that have specific state on them then we start, have to start thinking about tracking them individually. And figuring out when a particular machines state needs to be replicated on another machine to split the load up, and it makes it a much more complex decision.