Hi, welcome to video four of our second lesson in the introduction to MapReduce. In this video we will keep building on our MapReduce examples. We'll look at vector multiplication using Hadoop context information. A vector is simply an array of numbers. Vector multiplication is a basic operation in Mathematical analysis. Imagine you have two arrays of numbers, and you're just gonna multiply them together. The process of vector multiplication is to take the elements and multiply those together. So for this example, the first element of vector A is multiplied by the first element of vector B. The second element of A is multiplied by the second element of B. And so on, up to the last element, indexed by N in our examples. And at the end, all those terms are added up. Let's imagine we have two files that are very large vectors, and the files are partitioned in HDFS. To implement Vector Multiplication In MapReduce, our main design consideration is that the elements with the same index have to be grouped together in order to be multiplied. So the key value should be something like, index and the number, from the array. But there's a problem, the files are just numbers. After partitioning, there is no overt index information. So it's a little bit of a puzzle. It's not obvious how to know where the data came from in the original file, and there's my complementary picture of a pet. It turned out that Hadoop MapReduce implementation does have some ways to get extra information from the execution environment or from the configuration settings. That information is available to the mapper with special function calls. This extra information can be passed along in the key value output, or it can be used as a side effect in which the mapper puts some data outside the Hadoop system. For this example, lets assume each line already has a key value organization. So the mapper only needs to pass the data along, note it can take both files and output it as one file. The Hadoop MapReduce system will then shuffle and group the indices. And then the reducers will be able to do the multiplication. How is that gonna happen? Well, it's gonna have to get pairs of data with the same index number, Multiply them, each pair, and then add then up. The reducer will produce a subtotal in its output. So depending on the number of reducers, we might still need to add up all the data. However, at this point, the original data might get reduced to a small number of subtotals. So it might fit in memory and not need another MapReduce job. This ends this part of the lesson. The next video will continue with this example and consider computational trade-offs.