Hi. Now, as I've promised before and I always keep my promises.

Let's talk about the algorithm based on the article Connected

components in Map Reduce and beyond by Raimondas Kiveris et al.

Computing connected components of a graph

lies in the core of many data mining algorithms.

This problem is well studied,

yet many of the algorithms with good theoretical guarantees perform poorly in

practice especially when faced with graphs with hundreds of billions of pages.

Raimondas Kiveris et al designed the improved algorithm based

on the traditional MapReduce architecture for large scale data analysis.

In this work they represent

new alternating algorithm which has strong theoretical guarantees,

bounded communication cost and load balanced computation.

Extensive empirical emulation of this algorithm was performed

and the authors compared to the best known state of the art approaches.

With all the algorithm implemented on the same hardware,

the alternated algorithm outperformed

the fastest time algorithm reported in the previous work by Rastogi et al,

Finding Connected Components in MapReduce in Logarithmic Rounds.

The new alternated algorithm computes

the connected competence by iteratively

transforming the input graph over the multiple map reduce rounds.

It was proved that unlike past approaches,

this algorithm transforms the graph in a way that the number of edges never increases.

The alternating algorithm is local with every node in the graph performing

some rewriting decision based solely on the structure of its neighborhood.

This locality makes for an easy implementation in the distributed setting.

Now, we will introduce some notation.

Let G be an undirected graph on N nodes and M edges.

For a node v you denode by gamma of v,

the neighbors of v. You will use gamma plus or v to denode the neighborhood of v itself.

Furthermore with every node of v you will also say with the real number label l sub

v. The alternated algorithm makes use of the small star and large star operations.

Both operations are given the node u and the subset of its neighborhood N.

Let N of u be the neighbor node with the minimal label.

The operations replace for every v belonging to n

the edge uvu is an edge v, n of u.

Both the small star and large star operation can easily be implemented and distributed

Minear and it gives map reduced version of these two operations in algorithm one and two.

Wait guys.

I haven't told you where we will go together yet.

Let's go for sail the court.

Map receives a pair of nodes as

an input and solves them in the scanner order by associated L v lu.

It returns the search of pair of nodes as an output.

For example, if it gets pair u,

v and L sub v is smaller or equal than L sub u,

then pair remains the same u,

v and vice versa.

If L sub v is greater than L sub u,

then the output will be v, u.

At the shuffle phase you purchase the pair produced by a map.

Important to understand after it you don't get

the full set of neighbors gamma of u associated with each vertex u.

Since the shuffle purchase sorted pairs in case l sub u greater

than l sub v. Shuffle will purchase only u where pairs and never v u.

This is why we will put v to subset of neighbors of u but

never u to subsets of neighbors of v. Further,

for each node u you will recall a use function which gets as

an input node u and subset of its neighbors n from the shuffle phase.

Then you calculate m. The node with the minimal value l associated with it

reason is subset n. Then for each node v of subset n and for u itself,

you will emit a pair of node v or u and node m. Now,

in this slide you can see an illustration of small star at node eight.

The minimum neighbor of eight is one.

So small star connects one to all the node no larger than eight.

Let's iterate always pseudo code of this small graph together step by step.

On the map phase,

it will reverse all the edges in the way that the source is greater than the destination.

On the shuffle phase,

the source of the edge will serve as a group in case.

So all the detonation vertices with the same source vertex will group together.

Further, we will consider how A reduced phase works only for the source vertex eight.

The vertex with the minimal index among

the received subset of neighbors for vertex eight is one.

So finally pairs 1, 1, 5, 1,

7,1 and 8,1 will be immediate.

Now, let's browse through the large star operation pseudo code.

On the map phase each edge will be emitted together with its first copy.

During the shuffle phase again for each case source vertex let's call it U.

All the nation vertices will be collected in the same name the gamma of U.

On the reduced phase of large star operation you

will iterate over elements of said gamma of U,

this is an included vertex U.

Let's call that v. And emit a pair (v;m) and in case label of v is greater than

labels of u. M now is a minimal element among set gamma of u is edit vorex u.

Now, in this slide you can see an illustration of large star at node eight.

The minimum Lemma of eight is again one.

At figure v, large star connects one to all the laws.

These levels directly larger than eight.

For the same small graph,

as on the small star operation.

Let's iterate all the pseudo code together step-by-step.

On map phase we will duplicate all edges,

on shuffle phase source of the edge will serve as a group in case.

So all this destination vertices is the same source vertex you group together.

Further, we'll consider how reduced phase works only for source vertex eight.

The vertex in its minimal index are

more received set of neighbors for vertex eight is one.

So finally only (9,1) will be immediate.

Even though it seems for figure a and figure b

that the two operations can disconnects the graph, in the article,

the full levels of proved showing that performing

the operations at all nodes in parallel preserve the connectivity of the graph.

Lemma one. Performing the large star operation on

all the nodes in parallel preserves the connectivity of the graph. Lemma two.

After one round of large star the number

of edges in the resulting graph is no larger than in the original graph.

Lemma three.

The small star operation preserves connectivity and doesn't increase the number of edges.

I'll give a link to the full text of the article in the description of this lesson.

So if you are interested in details of the proof

you have an opportunity to satisfy your curiosity.

As it's visible from pseudo code in the slide you can see right now.

The alternated algorithm combines invocations of

large star and small star operations until convergence.

What in the current context would mean that a fix point is reached,

at which point the overall graph is reduced to a unit of disconnected stars.

One, for each connected component.

It was shown in the article's lemma eight,

that the alternating algorithm converse.

Let's sum up. In this video was presented

new parallel connected component's algorithm

that outperforms the current state of the art approaches.