Hi, in this video you will examine how GraphFrames is implementing the alternating algorithm from the article connected components in reproduce and beyond. To do this you will browse through the code. There is a copy of the original GraphFrames code. Why are not we looking at the real one, you could probably ask me. Because it's on Scholar. So to make your education process more smooth and pleasant, I've made a copy for you. [SOUND] Let's enjoy it together. All the logic will be implemented in method run which receives a graph itself and broadcast threshold. And right after receiving the graph let it run calls method prepare on it. How does method prepare actually arrange our graph? Method prepare is doing the duplicating vertices and assigning unique LONG_IDs to each. Changing edge directions to have increasing LONG_IDs from source to destination, the duplicating edges and removing self loops. In the returned GraphedFrame the vertex DataFrame has two columns. Column ID stores a long ID assigned to vertex. Column stores the original vertex attributes. The edge DataFrame has two columns. Column src stores the LONG_ID of the source vertex. And column dst stores the LONG_ID of the destination vertex. And you always will have source is less than destination. After the graph is prepared, you can continue. Let's utilize verbals converged, iteration and prvSum with values False, 1 and None respectively and starting iterations. One iteration of alternative algorithm consists of two phases, a large-star step and a small-star step. Let's remember, what is a life star step in the terms of the article? First, for each edge, you're emitting it and its reversed copy during the map stage. Then during the shuffle, for each works you're grouping all its neighbors. Finally, along the neighbors of the vertex u and vertex u itself, you are searching for the vertex with a minimal index. Let's call it m. For all the neighbor vertices v or vertex u with indices bigger than u you are emitting pairs v m. If you look now, at the of GraphFrames, it may be not obvious while doing the same. So let me explain. First, the method, minNbrs, is applied. How does the method minNbrs work? For each edge in the edge's DataFrame, it adds its reversed copy. Then for each add, finds its minimal neighbor and the amount of neighbors. According to the description of the large-start step in the article, for each vertex, you should find a vertex with minimal index among neighbors of vertex u and vertex u itself. So here you are adding the vertex itself to the set among which you should search for the minimum vertex. Going further, let's every edge with minNbr of its source vertex by the method skewedJoin. How does method skewdedJoin work? It finds vertices with a degree higher than broadcast threshold and calls a skewedJoin method of GraphFrames. Handing the skew in the Join keys by doing a broadcast Join for frequent keys and a normal Join for the rest. Let's come back to the large-star step. After skewedJoin is performed for each edge of the prepared graph where the source of the age is more than the destination. You have column with name minNbr containing the vertex with minimal index among the neighbors of vertex src, and vertex src itself. So in the resift triplet, minNbr is less or equal to src, and src is less or equal to dst. If now you remember names are called dst, src and minNbr as dst, then automatically u src will be bigger than u dst. This action makes the same effect as omitting only the pairs vm where l sub v is bigger than l sub u in the last step of large-star in the article. Consequently, they both mention is doing exactly the same large-star routine as described in the article. Let's examine now a small-star step. The article offers you to reorder all the vertices in edges in a way that makes src bigger than dst. You have already had this in GraphFrames implementation after the large-star step. For each fixed vertex, you should find the minimal element among its neighbors which are smaller. Finally, for each small neighbor of a src vertex and the vertex itself, according to the description of the small-star operation from the article, you should emit of pair, src neighbor or src itself mean neighbor. During this step you will receive all the pairs src neighbor mean neighbor. With this step you will add pairs src neighbor, rearranging them if needed in a way that src is less than dst. Because it could happen that vertex is smaller than its smallest neighbor. Finally, let's filter all rows where the vertex happens to be the smallest neighbor of a cell. Because according to the article it will be filtered later, during the large-star. Here, in GraphFrames, the invariant where src is less than dst, or input data for large-star operation if you prefer both to making list filtering. Finally, when another cycle, large-star, small-star, is over, you should check whether our algorithm is converged or not. To do this, you are summarizing all IDs of src vertices. And if the sum isn't equal to the sum of the previous step you change the meaning of perhaps some variable and continue iterations. If the sum equals, then the algorithm is convergec and now you will do the following. First, join vertices with last version of edges on field dst. Second, if there is no incoming edge to this vertex, that means it's the center of the connected component, and its ID serves as the connected component identificator. Third, select vertex attributes including its ID and new column component. To sum up, in this video you have learned how exactly the parameter broadcast threshold impacts the computational process. So now you have a better understanding how to tune it. You have browsed through example of algorithm implementation with good theoretical guarantees with GraphFrames and DataFrames API. And third, you understand now in detail how algorithm was implemented, and that gives you opportunity to predict the bottlenecks you can meet in practical application of it.