Hi, it's me again, Israel Herraiz, a Strategic Cloud Engineer at Google. In this video, you will learn about how to process data in a streaming with Dataflow. For that, there are three main concepts that you need to learn. How to group data in Windows. The importance of Watermarks, to know when the Window is ready to produce results, and how you can control when and how many times the Window will emit output. Let's start talking about Windows. It is likely that your first experience with Data Processing Pipelines is processing data in Batch. Batch pipelines are often run on standard schedule, for instance, once a day. They produce fresh results with frequency. Running Batch pipelines with a certain frequency is also a way to chunk data. When we have large amounts of data, we can divide the processing and handle all the data by doing Batches. In situations like this, what you probably need is a streaming pipeline. It is likely that your data is not stationary. Despite Beam processing Batches, the batches are artificially splits to simplify the processing of data. If your data is not stationary, Apache Beam, less you handle it as a stream of continuous data. However, dealing with strings is not only a matter of continuity and making splits to process data, there are other inherent problems to processing data and estimate. One of the main problems you have to deal with when processing data and estimator is a lack of order. Imagine a situation where there are processing events coming from a mobile application. One of your research here shown as a green square, started using the application at home at eight in the morning. You received some messages in your pipeline. Good. Then another user, shown here as a yellow hexagon did the same, but this is her was riding the subway in a tunnel with no form coverage. When the user returns to the surface and this phone coverage, you get the message with some delay. But wait, it maybe worse. Get another user, the circled in blue here was using, your fantastic application while flaying on a very long transcontinental flight using their mobile phone in airplane mode. This user enables the phone signal when they arrive at the destination, and suddenly just start getting more messages that were produced at eight in the morning. But that you are only seen hours later. How can you deal with auto afforded data? How can you make it splits to process data? The answer to both is Windows. But these Windows are not just simple groups or batches of data. Let see why. A Windows as a way to divide data in groups in order to do some transformations with the data. Windowing divides data into time-base finite chunks. Windows are required when doing aggregations over about unbounded data is in Beam primitives, such as a GroupBykey or Combiners. However, you can also do aggregations using state and timers without having to use a Window. In a Streaming pipelines, there are two dimensions of time, processing time and event time. In processing time, Dataflow assigns the current timestamp to every new message. In event time, we use instead the timestamp of the messages as it was set in the original source when the message was produced. If you could messages by a processing time. This is the same as micro Batching. Messages that were produced around the same time. If they arrive out of order, will be assigned to different Batches. Processing time is fine depending on kind of calculations you want to perform. But event time enables you to apply a more complex aggregation logic to the data. In event time, messages are grouped together depending on their timestamps generated at the source, not depending on the moment of their arrival. For instance, one message may be late and arrive very closely to another on-time message. These two messages belong to different Windows, but are arriving at approximately the same time. Dataflow reads the messages, timestamps determines that one of the messages was actually late and assigns it back to the proper Window, assuming the Windows is still open or waiting for late data. By doing this, we can recover the order and groups of data as they were producing the source even if they arrive out of order to Dataflow. This is a very powerful feature of a Streaming Pipelines. Here in lies the possibilities of doing complex and sophisticated calculations in a Streaming Pipelines even in the case of, out of order delivery. Apache Beam includes three different types of Windows that are available by default. Fixed, sliding, and sessions. We can also create custom Window types. Fixed Windows are those that are divided into tiny slices. For example, hourly, daily, monthly. Fixed time Windows consists of consistent, non-overlapping intervals. Sliding time Windows also represent time intervals in the data stream, however, a Sliding time Windows may overlap. For example, each Window we make capture 60 seconds worth of data, but a new Window will start every 30 seconds. The frequency with which a Sliding Windows begin is called a period. A typical application of Sliding Windows will be to calculate a moving average. Session-based Windows capture bursts of user activity. Session Windows are defined by a minimum gap duration and the timing is triggered by another element. Session Windows are data dependent Windows. They are not known ahead of time. Do you need to look at the data to figure them out? Examples are user Sessions, User Navigated Website, etc.