You have seen how the watermark is useful to have an idea of data completeness, and how to use metrics to see how the watermark is evolving. The default behavior is to trigger results when the watermark is pass, but that can be often a very long time. Do you have to wait that long before you can see the results coming out of your windows? No, you don't. Triggers are useful to define in precise detail when we want to see the results of our window. Let's see how triggers work. By using triggers, you can control the latency to produce a result, or you can ensure the data completeness before you emit a result or a combination of both. Triggers can be based on event time. For instance, a mid results after 30 seconds as measured by the messages timestamps, or on processing time, for instance, emit results every 30 seconds as measured by the workers clock, regardless of the messages timestamps, and they can also be based on data. For instance, emit some results after seeing 25 messages, or you can use any combination of the above triggers. With a composite trigger, we can implement a very complex logic for deciding when and how many times to trigger the results of our Windows. The default behavior is to trigger the watermark. If you don't specify a trigger, you are actually using the trigger after watermark. AfterWatermark is an event time trigger, we could also apply any other trigger using an event time. The message timestamps are used to measure time with these triggers, but we could also add custom triggers. If the trigger is based on processing time, the actual clock Walton is used to decide when to emit the results. For instance, you could decide to emit exactly every 30 seconds, regardless of the timestamps of the messages that have arrived to the window Aftercount is an example of a data-driven trigger. Rather than emitting results based on time, here we trigger based on the amount of data that has arrived within the window. The combination of several types of triggers opens a lot of possibilities with streaming pipelines. We may emit some results certainly using AfterProcessingTime, and then again at the watermark when data is complete, and then for the next five messages that arrive late after the watermark. In summary, you can use triggers to make sure that you emit results early, which is to say with minimal latency, or you can use them to make sure that you process late data and that your results will include all the relevant messages even if those messages have delayed. Or you can combine the two conditions. For instance, make sure that you emit results early and later repeat the calculation and emit new results when data is complete. When you emit results several times, how does Apache Beam repeat the calculation? You can actually control that. Let's talk about accumulation modes. When you trigger the window several times, you have to decide on the desired accumulation mode. There are two accumulation modes in Apache Beam, accumulate and discard. With accumulate, every time you triggered again in the same Window, the calculation is just repeated with all the messages that have been included in a window so far. With discard, once some messages have been used for a calculation, those messages are discarded. If new messages arrive later and there's a new trigger, the result will only include the new messages, and those messages would be discarded again should there be any additional triggered later. Let's see how these modes work with an example. This example is using fixed Windows of 10 minutes, but you don't want to wait so long until you see results, so that trigger is set to Aventine every couple of minutes. In the first trigger, the Window has only seen two messages, and we emitted a list containing just two messages by the time the next trigger files up, the window has received four more messages. Now the trigger emits the list again, containing the previous messages and the new messages. The third trigger, again includes all the previous messages and the new messages. If your Window is very wide, using accumulate as the accumulation mode may consume considerable resources as the accumulated output has to be stored while the Window is still open. The Windows and the messages here are the same as in the previous slide, but now we have set the accumulation mode to discard. Every new trigger will only use the new messages that the Window has received so far, and once the result is emitted, it will discard those messages. If the calculation you need to make with the Windows is associative and commutative, you can safely update that calculation using discard mode without any loss of accuracy. In the output storage where you store the partial results, you should be able to aggregate the partial results to get the actual calculation value. The main advantage of using the discard mode is that the performance will not suffer even if use a very wide Window, because no accumulation is stored for very long, only until the next trigger is released. Let's see how to specify triggers in Apache Beam. These examples here are in Python. In the example at the top, the pipeline triggers more than once per Window. There will be a trigger 30 seconds after opening the Window, and then again, once the watermark is reached, after that, for every late message within the first two days, that will be an additional trigger. Every Window will produce more than one output. Bear that in mind when designing your trigger. In the example at the bottom, the Window is not triggered at the watermark, but whenever the Windows sees 100 messages or 60 seconds have passed, whichever happens first, the trigger only computes that data. Once the trigger is calculated, the previous messages are discarded. Because the window allows for two days to wait for late data, the trigger will produce output if we have late messages within two days after the watermark. Even if we're not triggering at the watermark, the watermark is still important. You may have heard, by the way, that the Python SDK for Apache window doesn't support setting a value for allowed lateness. That was the case some time ago, but now allowed lateness is fully supported in Python, in Data flow and other runners. These are the same examples, but in Java. Here you need to specify the type of the collection. We are assuming this string but it could be actually any other type, even a custom class. They API is different than in the case of Python, more adapted to the customs of Java, but the concepts are exactly the same as those shown in the previous slide. You have learned how to process data and streaming with Apache Beam. A streaming is not only about the continuity of data, it has other important features as well. The most prominent, lack of folder when receiving data. Windows can help with that. Windowing by event time let's you recover the natural order of the data, but you can also use processing time, which is more like micro batching. To emit results in a window, watermarks are important to know if our data is complete or whether we should still wait for more data. If data arise after the watermark, that data will be considered late. That's not a big deal. With triggers, we can decide how to deal with late data. Remember that custom triggers let you decide when to meet results early when the data is complete and so on, and the window may have several triggers.