Hello. In this video we will take a look under the hood of motif finding method of reference library. Get ready for some really cool stuff. Relax, with a guide like me, you have nothing to worry about. Solving mutual friend task in lesson one you used the age list Crawford presentation. And during original AB there with the copy we see there in which column A was renamed as column B was renamed as column C. Considering these two data frames had the same value in column B. Modifying algorithm will do similar joins geo during finding the particular structural pattern. Follow me and I'll show you how modifying algorithm of graph frames is working. On the first step, the structural pattern stream, such as you can see here on the slide right now is splited on patterns from the following list. NamedVertex, AnonymousVertex, NamedEdge, AnonymousEdge, and Negation. For example, for string that you can see right now, you will get the full collection of patterns namedVertexA, AnonymousEdge, NamedVertexA, from NamedVertexA to NamedVertexB. NamedVertex B, NamedVertexB again AnonyomousA from NamedVertexB to NamedVertexC and finally NamedVertexC itself. On the second step you will over a collection of patterns for use on step 1 joining more and more columns. Let's make an agreement right now, right here that the original GraphFrame will be called g and the current result of iterations will be stored in the variable currentResult. How exactly every new pattern will be handled depends on two factors. First, type of pattern, and second, does currentResult already contain the columns with names mentioned in the proceeding pattern? Now, arm yourself with patience. We will browse together for all possible cases. I will try to describe each case as briefly as possible in order to maintain an understanding of what is going on. I do want you not just know that method modifying exists but to also have a thorough understanding of what is going on under its hood. And now let me draw your attention to NamedVertex pattern. Each current result contains vertexName then modifying will do nothing. If current result doesn't contain vertexName, then g.vertices dataframe will be transformed to one_column dataframe by the method nestAsCol. This method was discussed in detailed before in previous video, where we were talking about descend. Let's call the result of dataframe of method nestAsCol as one_column_vertices_df. Note that the only one column of vertices_df will be named vertexName. And will contain at least one with name ID. Further, of currentResult and one_colum_vertices_df will be done. Having finished with this,we're moving to in vertex. Modifying will do nothing and current result will stay untouched in this case. Named age and the source and destination have in type on a vertex is our next pattern to discuss. Graph frame age of data frame will be transformed to one column data frame with name one_column_ edges_df. By the merit s column. The only one column or one_column_edges_df will be edgeName. And subcolumns of column edgeName will be tested and. Of currentResult of one and one_column_edges_df data frame will be done. Ready for new step? Here we go. Named edge should be a type and destination named vertex. g.edges dataframe will be transformed to one column dataframe with the name one_column_edges_df by the. It will contain one column with the name edgeName, [INAUDIBLE] and [INAUDIBLE]. If current result already has the column with the name vertex name then the interjoin of current result with one_column_edges_df on condition current result dot the vertex name dot id is equal to one_column_edges_df. Will be done. If not, then first dstVertex dataframe will be transformed to one column dataframe by. This dataframe will have the name one_column_edges_df to and will contain one column with the name dstvertexname. Second, one_column_edges_df will be joined with one_column_vertoces_df on condition one_column_edges_df is equal to one_column_vertoces_df the vertexname. Let's call the product of this. Third, the Cartesian turn of current result and temper there will be done. Yet, we are not though please meet NamedEdge resource header type NamedVertex and destination heading type AnonymousVertex will be in the same way as the previous case. The only thing you should do to transform the previous case to current one is to replace the vetexName to name everywhere and in joining conditions. Shall we go further and speak about namedEdge, with source and destination having typed namedVertex. As you've probably already guessed there are four possibilities in this case. Column named named vertex is in current result and column named named vertex is in current result. Second variant column named named vertex is not in current result and column named in vertex is not in current result as well. Third, Column named src NamedVertex is not in current result and column named dst NamedVertex is in current result. And last one, column named vertex is in current result and column named named vertex is not in current result. Let's consider these four cases one by one. But before g.edges dataframe will be transformed to one column dataframe with name one_column_edges_df by the because you will need it in all four cases. The one_column_edges_df will contain one column with the name edgeName with subcolumns, dst and src. In the first case, code named src named Vertex is in currentResult and code named dst named Vertex is in currentResult as well. currentResult will be join with one_column_edges_df on condition. one_column_edges_df.edgeName.src is equal to currentResult.srcVertexName.id. And one_column_edges_df.edgeName.dst equal to currentResult.dstVertexName.id named .id. In the second case, column named srcn nameVertex is not in current result and column named vertex is not in current result as well. First, dataframe will be transformed into one column dataframe by the nestascol two times. Let's call the first resulting data frame of the [INAUDIBLE] nestAsCol as one_column_src_vertices_df. It will contain one column with the name srcVertexName. And let's call the second resulted data frame of the [INAUDIBLE] nestAsCol as one_column_dst_vertices_df. It will contain one column with the name dstVertexName. Second, one_column_edges_df will be joined with one_column_vertices_df on condition one_column_edges_df.edgeName.dst is equal to one_column_dst_vertices.df vertex name dot ID. Let's call the product of this drawing. Third, will be joined with one column on condition, tmpDF.edgeName.src is equal to one_column_src_vertices_df.srcVertexName- .id. Let's call the product of this joint tmpDF2. Finally, the cartesian join of tmpDF2 and currentResult, we'll be done. Phew, that was a hard job, and successfully done. In the third case, column named src named Vertex is not in currentResult, and column named dst named Vertex is in currentResult. First, g.vertices data frame. Either transformed to one column data frame by the [INAUDIBLE] nestAsCol, as usual. This data frame will have the name one_column_vertices_df, and will contain one column named srcVertexName. Second, one_column_edges_df will return with one column vertex data on transition. One_column_edges_df.edgeName.src is equal to one_column_vertex_df VertexName.id. Let's call the product of this join tmpDf. Third of current result and tmpDF will be done on condition currentResult.dstVertexName.id is equal to .edgeName.dst. Last case. Column named srcVertexName isn't in currentResult and column named dst namedVertex is not in current result is the same as the third one. To transform the third case in last one, you need to change srcVertexName to dstVertexName name everywhere, and src to dst in learning conditions. Last two cases, AnonymousEdge and Negation, can be reduced to the previously considered cases. Let's start from the AnonymousEdge. It will be transformed to NamedEdge, and with the new edge named __tmp where it will now proceed further as NamedEdge, but after the last join called __tmp, this is thrown away. From current result. Let's proceed by negation of named edge or anonymous edge. Negation will be done in two steps. On the first step, pattern negation will be transformed to NamedEdge or AnonymousEdge dependent what was inside it. Then NamedEdge or AnonymousEdge parent will proceed on the current result in the way described before in this video. A new dataframe will be received. Let us call it tmpDF. Important to mention that tmpDF is a subset of rows of origin currentResult. So, on the second and final step, from currentResult, all the row that are in both data frames, currentResult and tmpDF will be deleted. Data frame received as a result of pattern precision becomes in due course result, and iterations over collection of patterns from the step one continue. Let's sum up. What do we have learned in this video? Now, you know in detail how motif finding algorithm of graph frames is working.