Now that we've gotten the overview of the lifecycle, let's take a look at unit testing in some detail. Unit tests are fundamental to all software development and Beam is no exception. We use unit tests and Beam to assert behavior of one small, testable piece of your production pipeline. These small portions are usually either individual DoFns or PTransforms. These tests should complete quickly and they should run locally with no dependencies on external systems. To get started with unit testing in Beam Java pipelines, we need a few dependencies. Beam uses JUnit 4 for unit testing, and here you can see an example of the dependency section of a Maven POM for a Beam pipeline. In a regular Beam pipeline, the pipeline object represents your pipeline. TestPipeline is a class included in the Beam SDK specifically for testing transforms. When writing tests, use TestPipeline in place of Pipeline, where you would create a pipeline object. Unlike Pipeline.create, TestPipeline.create handle setting pipeline options internally. PAssert is another class provided as part of Beam that lets you check the output of your transforms. Assertions on the contents of a PCollection are incorporated into the pipeline. These assertions can be checked no matter what pipeline runner is used. PAssert becomes part of your pipeline alongside the transforms. PAssert works on both local and production runners. Putting this altogether, unit tests help you ensure the correct functioning of your pipeline. Your pipeline is made up of DoFns subclasses and composite transforms, which might combine multiple DoFns. Unit tests let you provide known input to your DoFns and composite transforms, then compare the output of those transforms with a verified set of expected outputs. The Apache Beam SDK provides a JUnit rule called TestPipeline for unit testing individual transforms like your DoFns subclasses, composite transforms like your PTransform subclasses, and entire pipelines. You can use TestPipeline on a Beam pipeline runners such as the direct runner or the Dataflow Runner, to apply assertions on the contents of PCollection objects using PAssert as shown in the code snippet here. In this minimal example, we instantiate a TestPipeline, then create a test PCollection containing some data. Then we assert that the PCollection contains the data we expect in any order. One thing to keep in mind when you're developing your pipeline is that it's an anti-pattern to design your pipeline too much around anonymous subclasses of DoFn. Anonymous subclasses make it impossible to test the correctness of the transform without duplicating the code in the test, which will quickly become a challenge to maintain. Anonymous classes are also not as reusable as named subclasses would be. Prefer named subclasses to anonymous ones. When we compare the right-hand code block to the anti-pattern in the left block, we see that the DoFns are now named subclasses. In other words, rather than putting the DoFn code in line in our ParDo, we create an instance of the transform. Named subclasses are easily testable, so we can validate their behavior independently without having to execute the entire pipeline. In addition to the functionality of our transforms, we can and should test our assumptions about how windowed transforms will behave. Beam provides a create that timestamp method which can be used to create timestamped elements in a testing PCollection. You can manipulate the timestamp directly as we do in this example, by adding the window duration to the timestamp of the last element. Then in the right-hand code block, you can see that we apply fixed windows of window duration, and perform a count on the windowed elements. We can then assert that the resulting PCollection contains the windowed calculations we expect. Note that windowing takes place in both batch and streaming pipelines. Testing how your windowed transforms behave useful in both types of pipelines. Speaking of streaming pipelines, let's talk about how to test those. TestStream is a testing input that generates an unbounded PCollection of elements, advancing the watermark, and processing time as elements are emitted. After all of the specified elements are emitted, TestStream stops producing output. Each call to a TestStream dot builder method will only be reflected in the state of the pipeline after each method, before it has completed and no more progress can be made by the pipeline. A pipeline runner must ensure that no more progress can be made in the pipeline before advancing the state of the TestStream. For streaming pipeline tests, we'll use TestStream to create a pipeline object that enables you to model the effect of element timings. Let's take a look at what this looks like in code. To use TestStream, you first create a TestStream instance. Then you add timestamped elements to it. You can manipulate the timestamps in the TestStream by adjusting the timestamp objects, and manipulate the position of the watermark using an instant object. Advancing the watermark to infinity closes all windows so that you can perform your windowed calculation, and assert on the results. TestStream is supported by the Direct Runner and the Dataflow Runner. Use both to carry out your streaming pipeline tests. We can also test more complex streaming interactions. In this example, we're asserting on the presence of certain elements in particular panes of a window. We also ensure that for all the panes, none have less than three elements or more than five.