Now that we’ve explored compute and why it’s needed for big data and ML jobs, let’s now examine storage. For proper scaling capabilities, compute and storage are decoupled. This is one of the major differences between cloud and desktop computing. With cloud computing, processing limitations aren’t attached to storage disks. Most applications require a database and storage solution of some kind. With Compute Engine, for example, which was mentioned in the previous video, you can install and run a database on a virtual machine, just as you would do in a data center. Alternatively, Google Cloud offers fully managed database and storage services. These include: Cloud Storage Cloud Bigtable Cloud SQL Cloud Spanner Firestore And BigQuery The goal of these products is to reduce the time and effort needed to store data. This means creating an elastic storage bucket directly in a web interface or through a command line for example on Cloud Storage. Google Cloud offers relational and non-relational databases, and worldwide object storage. We’ll explore those options in more detail soon. Choosing the right option to store and process data often depends on the data type that needs to be stored and the business need. Let’s start with unstructured versus structured data. Unstructured data is information stored in a non-tabular form such as documents, images, and audio files. Unstructured data is usually suited to Cloud Storage, but BigQuery now offers the capability to store unstructured data as well. Cloud Storage is a managed service for storing unstructured data. Cloud Storage is a service for storing your objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format. You store objects in containers called buckets. All buckets are associated with a project, and you can group your projects under an organization. Each project, bucket, and object in Google Cloud is a resource in Google Cloud, as are things such as Compute Engine instances. After you create a project, you can create Cloud Storage buckets, upload objects to your buckets, and download objects from your buckets. A few examples include serving website content, storing data for archival and disaster recovery, and distributing large data objects to end users via Direct Download. Cloud Storage has four primary storage classes. The first is Standard Storage. Standard Storage is considered best for frequently accessed, or “hot,” data. It’s also great for data that is stored for only brief periods of time. The second storage class is Nearline Storage. This is best for storing infrequently accessed data, like reading or modifying data once per month or less, on average. Examples include data backups, long-tail multimedia content, or data archiving. The third storage class is Coldline Storage. This is also a low-cost option for storing infrequently accessed data. However, as compared to Nearline Storage, Coldline Storage is meant for reading or modifying data, at most, once every 90 days. The fourth storage class is Archive Storage. This is the lowest-cost option, used ideally for data archiving, online backup, and disaster recovery. It’s the best choice for data that you plan to access less than once a year, because it has higher costs for data access and operations and a 365-day minimum storage duration. Alternatively, there is structured data, which represents information stored in tables, rows, and columns. Structured data comes in two types: transactional workloads and analytical workloads. Transactional workloads stem from Online Transaction Processing systems, which are used when fast data inserts and updates are required to build row-based records. This is usually to maintain a system snapshot. They require relatively standardized queries that impact only a few records. Then there are analytical workloads, which stem from Online Analytical Processing systems, which are used when entire datasets need to be read. They often require complex queries, for example, aggregations. Once you’ve determined if the workloads are transactional or analytical, you’ll need to identify whether the data will be accessed using SQL or not. So, if your data is transactional and you need to access it using SQL, then Cloud SQL and Cloud Spanner are two options. Cloud SQL works best for local to regional scalability, while Cloud Spanner, it best to scale a database globally. If the transactional data will be accessed without SQL, Firestore might be the best option. Firestore is a transactional NoSQL, document-oriented database. If you have analytical workloads that require SQL commands, BigQuery is likely the best option. BigQuery, Google’s data warehouse solution, lets you analyze petabyte-scale datasets. Alternatively, Cloud Bigtable provides a scalable NoSQL solution for analytical workloads. It’s best for real-time, high-throughput applications that require only millisecond latency.