Hello. In the previous video, you have learned about RDDs. You have seen that you can use data stored in a stable storage as an RDD. That is the first way to create an RDD. However, if you recall the definition of RDD, it is said to be a read-only collection of records. So the question is, how can we work with the data? That's where transformation come in. They are the second way how you can create an RDD from the existing one. For example, with transformations, you can filter records and group them by a key. More precisely, transformations allow you to create new RDDs from the existing RDDs by specifying how to obtain new items from the existing items. That implies that the transformed RDD depends on the source RDD to be useful. To make this definition more accurate, let's take a look at the examples. A simplest transformation is the filter transformation. It is biometrized by a predicate. The result in DataSet only contains items that satisfy the predicate. As you can see here in the example, if you take the RDD with numbers for one to five, and apply the filtered transformation that keeps only even numbers, you will get the RDD that effectively contains numbers two and four. Another transformation that should be familiar for you, after the MapReduce lesson, is the Map Transformation. It is biometrized by a math function that transforms or maps a single source item into a single transformed item. For example, the map transformation with a function that doubles the input number, converts the RDD with numbers from one to five to the RDD with numbers two, four, six, eight and ten. A generalization of filter and map is called flatMap. Here, a function maps a single source item into a possibly empty array of transformed items. For example, to filter out the item, you simply need to map it into the empty array. The map function would be implemented as mapping to the singleton array. Let's take a look at how you can implement the filtered RDD. Recall, you need to implement three functions: partitions, iterator and dependencies. Since the transformed RDD's mostly the same as source one, you can reuse partitions of the source RDD as there is no need to change the partitioning. Now, every field or partition depends on the source partition. You can establish this relation by providing a dependency object that establishes one-to-one correspondence between the filtered and the source partitions. Now, as you provided dependencies for every partition, when creating an iterator over the filter partition, Spark would inject an iterator of the source partition into the iterator function call. So, your job now is to build an iterator over the filter partition. This could be done by reusing the parent iterator. When requested for the next value, you can pull values from the parent iterator until it returns you an item that satisfies the predicate. And I think here is that you do not need to know about implementation details or the parent RDD at all. If you noticed, you have implemented this so-called lazy iterator. That means that when you create a filtered RDD, no immediate filtering occurs. The filtering starts to happen only when you start to pull items from the iterator. This is a property of laziness. All the transformations in Spark are lazy. They merely create a description of what needs to be done and then wait until you start requesting the data, and that makes sense. If you are working with a cluster and you have just made a filter RDD, it is unclear where to start executing the operation and where to deliver the result. Obviously, you do not want all the data to go through your computer. And as you will learn later, there are other operations available in Spark called "Actions" that trigger the actual computation, and hence, involve the postponed transformations. One important technical detail, as the filtering is done lazily, the predicate closure must be captured within the RDD. That means that you have to be careful about the code you put in the predicate. You have no control on when and where this code will be executed. That means it is a bad idea to, say, access the local files or to capture the local references in the closure. One more thing, whenever you apply a transformation to the RDD, you implicitly construct a dependency graph on the RDDs, as shown in this slide. This graph is used by the framework to schedule jobs as we will see later. Moreover, you can imagine a partition level dependency graph where vertices correspond to partitions and edges to inter-partition dependencies. In the case of filtered transformation, this is a slice graph with a few edges. I think it is a good idea to make a short break here. Please, take a quiz to check yourself, and when ready, proceed to the next video. See you.