Hi, every time I start my video with some scintillating remarks. It has become a tradition already. If rules are made to be broken, traditions, for sure. I'm not in my best mood today, so no jokes for you guys, sorry. Let's just start. In this video you will learn how to find all the connected components of the graph using GraphFrames API. In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths and which is not connected to any additional vertices in the supergraph. For example, the graph shown in the illustration has three connected components. A vertex with no incident edges is itself a connected component. A graph that is connected by itself has exactly one connected component consistent of the whole graph. Testing whether a graph is connected is an essential preparation step for every graph algorithm. Subtle, difficult to detect, bugs often happen when your algorithm is run only on one component of the disconnected graph. As far as the connected components of a graph represent roughly said the pieces of the graph, finding the number and sizes of connected components allows you to do the descriptive research of the graph data you have. For instance, it helps you to understand what amount of unrelated vertices the graph has. Also, computing connected components of a graph lies at a core of many data-mining algorithms. For example, consider the problem of identifying clusters in a set of items. You can represent each item by a vertex and add an edge between each pair of items that are deemed similar. The connected component of this graph corresponds to different classes of items. Let's learn how to find the connected component of the particular graph using the GraphFrames API. But before, let's take a look at the graph which connected components you're going to look for. To create each in the terms of GraphFrames, you need to interpret the following code. And then you just go method connected components like in the slide. Now for each vertex you know the ID of connected components it belongs to. Let's go for parameters of connected component method to understand what opportunities you have. First you can choose which algorithm to use. To put it, algorithms are GraphFrames and GraphX. Graphframes is based on the article Connected Components in MapReduce and Beyond by Raimodas Kiveris et al. And we will discuss it in detail in next video. I can particularly hear that sigh of relief after these words. Yes, you will definitely see me again and I don't mean [INAUDIBLE] Another variant GraphX, converts the graph to GraphX format and then uses the connected components implementation from GraphX library. Second parameter you can adjust is checkpoint interval in terms of number of iterations, and default is two. Checkpointing regularly helps recover from failures, clean shuffle files, shortens the lineage of the computation graph and reduce the complexity of plan optimization. As of Spark 2.0 the complexity of plan optimization would grow exponentially without checkpointing. Hence, disabling or searching longer than the checkpoint intervals, are not recommend. Checkpoint data is saved under org.apache.spark.SparkContext.getCheckpoi- ntDir, its prefix connected-components. If the checkpoint directory isn't set, this throws a java,io.IOException. Set a nonpositive value to disable checkpointing. This parameter is only used when the algorithm is set to GraphFrames. The last parameter you can adjust in connected-components method is broadcast threshold. In propagating competent assignments default is 1 million. If a node degree is greater than the threshold at some iteration, its complement assignment will be collected, and then broadcasted back to propagate the assignment to its neighbors. Otherwise, the assignment propagation is done by a normal Spark join. This parameter is only used when the algorithm is set to GraphFrames. Oops, the time is up. Let's summarize what we learned in this video. First, what is a connected component of the graph. Second, how to find all connected components of the graph using GraphFrames API.