So let's talk about Compute Engine and Cloud Storage. It's useful to know how Compute instances and Cloud work, because the Datalab instance is going to run on these. For persistent data in the Cloud, you will use Cloud Storage. So you need to understand Cloud Storage as well. Think of Compute Engine as a global distributed CPU and Cloud Storage as a global distributed disk. Datalab though, is a single node program, so it runs on the single Compute Engine instance. However, when we launch off Dataflow jobs or Cloud ML Engine jobs, we kick off the processing to many Compute Engine instances. Compute Engine essentially allows you to rent a virtual machine on the Cloud to run your workloads. So what are some of the things you can customize? Things like the number of cores, the amount of memory, the disk size, the operating system. But things like load balancing, networking, etcetera, can be obtained, but you're not tied into your initial choices, you can always change them, and the billing discounts are automatic depending on how much you use the machine. So disks attached to Compute Engine instances are fast, but they're ephemeral. When the VM goes away, the disk goes away. Well, Google also offers persistent disks, but let's ignore that for now. Cloud Storage is durable, that is, Blobs in Cloud Storage are replicated and stored in multiple places. Cloud Storage is also accessible from any machine. Because of the speed of the network, petabit bisectional bandwidth within a Google Center, which essentially means that a 100,000 machines can talk to each other at ten gigabit per second, you can directly read off Cloud Storage. In fact, that's what we will do when we write our TensorFlow programs. The purpose of Cloud storage is to give you a durable global file system. But how is it organized? A typical Cloud Storage URL might look like gs:// acme sales/ data/ sales, 0-0-3.csv. The acme-sales, that's called a bucket. The name of the bucket is globally unique. Think of it like a domain name in Internet URL. The way to get a globally unique bucket name is to use a reverse domain name, in which case, Google Cloud Platform will ask you to prove that you own the domain name in question, or simply use your Project ID. Unless you're extremely unlucky, your project ID, which is also globally unique, will not have already been used for a bucket name. The rest of the gs URL is by convention, like a folder structure, with a complete gs URL referring to an object in Cloud Storage. So how do you work with it? You can use gsutil, this is a command line tool that comes with the Google Cloud SDK. If you spin up a Compute Engine instance, gsutil is already available. On your laptop, you can download the Google Cloud SDK to get gsutil. Gsutil uses a familiar unix command line syntax is for example MB and RB, or make bucket and remove bucket. You can do CP to do a copy, and instead of a command line, you can also use the GCP console or you can use a programming API, or you can use a rest API. Here I'm showing you how to copy a bunch of files sales*. CSV to a specific Cloud storage location. Remember I said Cloud Storage buckets are durable. This means that they're stored redundantly. You also get edge caching and fail over simply by putting an object in Cloud Storage. However, just because Cloud Storage is a global file system, doesn't mean you can forget about latency considerations. You are better off storing the data close to your Compute Nodes. However, what happens about service disruption? You need to distribute your apps and data across multiple zones to protect yourself in case a single zone goes down. So for example, if a zone suffers a power load, you can leverage zones and different regions if you need to, even additional redundancy. A zone is an isolated location within a region, it is named region name/ a zone letter. Then finally for global availability, so if you're building a global application where you have customer spread across the globe, then you would want to distribute your absent data across regions.