Welcome to the session on Data Processing with Azure. All of you must be aware that Azure is a Cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through Microsoft managed datacenters. This course has been designed to equip you with the knowledge needed to process, store, and analyze data in order to make informed business decisions. In this course, we will cover an explanation of batch processing with the Databricks and Data Factory on Azure, creating pipelines and activities, creating schedules and triggers, linking services and tasks, select windowing functions, configuring input and output for streaming data solutions, and configuring ELT and using it in Polybase. So let's get started with the basics. What is Azure Data Factory? It is a service designed to allow developers to integrate disparate data sources. It is a fully-managed cloud-based data orchestration service that enables data movement and transformation. Mapping data flows are visually designed data transformations in Azure Data Factory. The resulting data flows are executed as activities within Azure Data Factory. Data flow activities can be operationalized via existing data factory. Data integration scenarios often require data factory customers to trigger pipelines within the Azure storage account. Data factory let you trigger pipelines on an event. Now that we have an idea of what the data factory is, let us understand what Azure Stream Analytics and Databricks are. Processing big data in real time is now an operational necessity for many businesses. Azure Stream Analytics is Microsoft's serverless, real-time analytics offering for complex event processing. It enables customers to unlock valuable insights and gain competitive advantage by harnessing the power of big data. Databricks is powered by Microsoft, giving it the ability to ingest data from a diverse set of sources and perform simple yet scalable transformations of data. The real-time interactive querying environment and data visualization capability of Databricks makes this typically slow process much faster. Since the introduction of Azure Databricks in 2018, there's been a lot of excitement around the potential of this unified analytics platform. Azure Databricks can now simplify the complexities of deploying cloud-based analytic solutions. One of the primary benefits of Azure Databricks is its ability to integrate with many other data environments to pull data through an ETL or ELT process. In this course, we will examine each of the E, L, and T to learn how Azure Databricks can help ease us into a Cloud solution. In real life, the need to deliver data in an understandable format is necessary to provide actionable insights that extend the needs of just data engineers and scientist. With that in mind, how can we expect marketer, salesman, and business executives to understand and utilize comprehensive analytics platforms such as Azure Databricks? Databricks provides adequate visual semantic capabilities to Azure Data scientists and analysts to make Azure cloud-based applications customer-friendly. Pipelines, activities, and datasets are common words which come to mind when talking about Data Processing in Azure. Let's go through each of them one by one. Azure Data Factory comes equipped with triggers and pipelines, which are orchestrated by the Azure setup consisting of Data Factory, Databricks, and others. A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a Spark job on HDInsight cluster to analyze the log data. The activities in a pipeline define actions to perform on your data. A dataset is a named view of data that simply points references the data you want to use in your activities as inputs and outputs. Next, we go to the concept of a windowing function. Azure is used in conjunction with SQL Server and the Windows operating system to make data streams more effective in a Cloud setup. In time-streaming scenarios, performing operations on the data contained in a temporal window is a common pattern. Stream analytics has native support for windowing functions, enabling developers to author complex stream processing jobs with minimal effort. Azure synapses are limitless analytics service that brings together enterprise data warehousing and big data analytics. It gives you freedom to query data on your terms using either serverless on-demand or provisioned resources at scale. Azure synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs. Now, let's understand ETL, Extract Transform Load versus ELT, Extract, Load, Transform. SMP, that is symmetric multiprocessing, involves serverless oriented multiprocessor computer hardware and software architecture, where two or more identical processors are connected to a single shared main memory. Traditional SMP data warehouses use an Extract, Transform, and Load process for loading data. The Azure SQL Data Warehouse is a massively parallel processing architecture that takes advantage of the scalability and flexibility of compute and storage resources. Using an Extract, Load, and Transform process can take advantage of MPP, and eliminate resources needed to transform the data prior to loading. Massively parallel processing is the coordinated processing of a single task by multiple processors, each processor using its own OS and memory and communicating with each other using some form of messaging interface. MPP can be setup with the shared nothing or shared disk architecture. In a shared nothing architecture, there is no single point of contention across the system, and nodes do not share memory or disk storage. Data is horizontally partitioned across the nodes, such that each node has a subset of rows from each table of the database. Each node then processes only the rows on its own disks. Systems based on this architecture can achieve massive scale as there is no single bottleneck to slow down the system. Extract, Load, and Transform can take advantage of MPP and eliminate the resources needed. With this, we come to the end of the introductory session on Azure. In the next session, we will delve into batch processing with Databricks and Data Factory on Azure.