The final layer of the Google Cloud infrastructure that is left to explore is big data and machine learning products. In this video, we’ll examine the evolution of data processing frameworks through the lens of product development. Understanding the chronology of products can help address typical big data and ML challenges. Historically speaking, Google experienced challenges related to big data quite early– mostly with large datasets, fast-changing data, and varied data. This was the result of needing to index the World Wide Web And as the internet grew, Google needed to invent new data processing methods. So, in 2002, Google released the Google File System, or GFS. GFS was designed to handle data sharing and petabyte storage at scale. It served as the foundation for Cloud Storage and also what would become the managed storage functionality in BigQuery. A challenge that Google was facing around this time was how to index the exploding volume of content on the web. To solve this, in 2004 Google wrote a report that introduced MapReduce. MapReduce was a new style of data processing designed to manage large-scale data processing across big clusters of commodity servers. As Google continued to grow, new challenges arose, specifically with recording and retrieving millions of streaming user actions with high throughput. The solution was the release in 2005 of Cloud Bigtable, a high-performance NoSQL database service for large analytical and operational workloads. With MapReduce available, some developers were restricted by the need to write code to manage their infrastructure, which prevented them from focusing on application logic. As a result, from 2008 to 2010, Google started to move away from MapReduce as the solution to process and query large datasets. So, in 2008, Dremel was introduced. Dremel took a new approach to big-data processing by breaking the data into smaller chunks called shards, and then compressing them. Dremel then uses a query optimizer to share tasks between the many shards of data and the Google data centers, which processed queries and delivered results. The big innovation was that Dremel autoscaled to meet query demands. Dremel became the query engine behind BigQuery. Google continued innovating to solve big data and machine learning challenges. Some of the technology solutions released include: Colossus, in 2010, which is a cluster-level file system and successor to the Google File System. BigQuery, in 2010 as well, which is a fully-managed, serverless data warehouse that enables scalable analysis over petabytes of data. It is a Platform as a Service (PaaS) that supports querying using ANSI SQL. It also has built-in machine learning capabilities. BigQuery was announced in May 2010 and made generally available in November 2011. Spanner, in 2012, which is a globally available and scalable relational database. Pub/Sub, in 2015, which is a service used for streaming analytics and data integration pipelines to ingest and distribute data. And TensorFlow, also in 2015, which is a free and open source software library for machine learning and artificial intelligence. 2018 brought the release of the Tensor Processing Unit, or TPU, which you’ll recall from earlier, and AutoML, as a suite of machine learning products. The list goes on till Vertex AI, a unified ML platform released in 2021. And it’s thanks to these technologies that the big data and machine learning product line is now robust. This includes: Cloud Storage, Dataproc, Cloud Bigtable, BigQuery, Dataflow, Firestore, Pub/Sub, Looker, Cloud Spanner, AutoML, and Vertex AI, the unified platform. These products and services are made available through Google Cloud, and you’ll get hands-on practice with some of them as part of this course.