Welcome back. So we are gonna continue our lesson one, of this module to look at the Read and Write Process in HDFS. So the idea is to explain the write process in HDFS, detail the replication pipeline process, and then look at the read process. So the write process in HDFS is initiated by our client, so a circle here on the left, you can see the HDFS client write request. So initially, the data gets cached locally on the client, and as it reaches a particular size, like typically a block size, it's going to contact the NameNode and get more information on where that block should go, essentially. So this initial caching is client side buffering, and it give you better write throughput, and there's POSIX requirement relaxation. So once the NameNode is contacted, it responds with a list of DataNodes. And the NameNode is Rack aware as I've mentioned earlier. So like our default replication of three, what will happen is HDFS NameNode would say, the primary replica block is placed on the local Rack. So, Rack 1 in this example. Assuming that the HDF client is in Rack 1. And then the next replica goes on a remote Rack. So Rack 2 in this case, and the replicas are in red here. And essentially what this does is prevents against getting loss, if the Rack that you're writing to fails, writing the primary block fails. And then it protects against failure of entire Rack order node, because you're using a entirely different Rack for the second block. Now, the third block is on the same remote Rack as the second block. And this is so that you are minimizing network traffic between Racks. So, essentially the client gets a list of DataNodes back. At this point, the first DataNode starts receiving data in small portions from the client, and writes to the local repository. The data portion is then transferred to the second DataNode, which in turn writes to the local repository there. And remember this second DataNode is on a remote Rack. And then transports to the third DataNode, and which does the same thing over again. So essentially the DataNode can be receiving data, and forwarding data at the same time in a pipeline. So what this means is the entire application process is pipeline, and you're not gonna take three x the amount of time to make three copies of your block. You're gonna take much less than that. Once all the blocks are written, and the file flow is done, essentially this process is committed into a persistent state by the NameNode. A NameNode is what receives the heartbeat and block reports from all the DataNodes. So secured from all the DataNodes, is that information. And if there is a failure of a DataNode, the heartbeat is gonna go away, or maybe you get a corrupted block. Either way, you're gonna have the NameNode doing the initiation of fixing that aspect of it. So on the read side, again the client does the request from the NameNode. It gets information about the blocks, how they're laid out. And HDFS tries to satisfy the read request from the nearest replica. So you're not, basically again, trying to basically do local reads, and you're not doing network transfer as much as possible. So this is essentially the crux of how a read and write process works on HDFS. So now that we know this, we're gonna look into performance tuning and robustness in our next video.