0:08

So tut's method tells you how to lay out the nodes of a planar graph in a plane so

Â that the edges don't cross.

Â If you don't have a planar graph, if you have an arbitrary graph, you need to use

Â a more specific methods to lay out a graph so that it could be easily visualized.

Â So as we saw in the last set of slides tut's method for laying out a planar graph

Â places each node at a position that's the average of it's neighboring nodes.

Â And so each of the nodes is applying a force to the current nodes position and

Â so that's an example of a force directed layout.

Â Then there's quite a few different force directed layouts,

Â a popular one is called Gem and that's what was used to layout this example.

Â This is a interaction graph between yeast proteins,

Â there is about 1500 yeast proteins and about 2000 interactions and so the yeast

Â proteins are indicated by nodes and the interactions are indicated by edges.

Â 1:06

So in the GEM algorithm for forced directed layout,

Â we're simulating the force of a spring system.

Â So each edge between two nodes corresponds to a spring, and

Â that spring has a rust length if the nodes are farther than that rest length and

Â they'll be attracted to each other.

Â If they're closer than that rest length then they'll be repelling each other.

Â And also, if you have high degree nodes, if you have a node with a lot of

Â neighbors, then Jim will make that rest length of the springs

Â larger to try to push away those neighbors since there are so

Â many of those neighbors sharing an edge with that high degree node.

Â 1:43

There's also a repulsion

Â between nodes even if there are not connected by a spring so that

Â nodes don't end up on top of each other even if they're not connected by an edge.

Â And that repulsion force is basically equal to one over the distance

Â between the node, between any two nodes.

Â And then finally all the nodes experience a force of gravity towards the center

Â of the graft to keep things from wandering too far away.

Â And a small random perturbation force just to make sure

Â that nodes take unique positions.

Â 2:17

As we talk about the layout of graphs, by placing nodes,

Â it's good to think of the centrality of nodes.

Â And centralities are ways of analyzing

Â you know where a node should be positioned in a layout.

Â Should it be more central or should it be on the periphery here?

Â So these nodes that are blue in color are central nodes.

Â These nodes that are red in color are non-central nodes.

Â There's many different centrality metrics.

Â Ways of measuring how central a node should be in a layout.

Â Node degree is a simple method where you just count the number of edges

Â that a node has.

Â And so nodes with high degree should be placed towards the center and

Â nodes with low degree might appear on the boundary.

Â 3:08

Page rank is a kind of centrality measure which Google uses to find popular

Â webpages by basically counting the number of incoming links from other webpages.

Â Another is based on the isolation metric.

Â You can think of the distance between nodes.

Â For example this green node here

Â to this red node has three edges between that, so it's distance would be three.

Â It's the length of the shortest number or

Â edges to get from one node to another node is the distance.

Â The isolation matrix basically adds up

Â the distance from a given node to all the other nodes in the graph.

Â 3:47

And if I add those up then I'll get a smaller result for

Â nodes towards the center of this graph.

Â And a larger result to nodes in the periphery of this graph,

Â in this particular layout.

Â And so you can think of closeness centrality as basically the inverse of

Â that isolation metric.

Â So that nodes with high closeness centrality would have low isolation

Â metric.

Â And you can also think of graph centrality, which is like the isolation

Â metric except I'm not taking the distance to all the nodes and adding them up.

Â I'm taking the distance between the node and the farthest node away from it, and

Â one over that gives me what's called the graph centrality.

Â And finally, there's between the centrality.

Â Between the centrality basically looks at every pair of nodes,

Â and finds the shortest path between that pair of nodes.

Â 4:39

If I take the shortest paths between any node and any other node, all

Â of those shortest paths, how many of those shortest paths go through a given node?

Â That tells me the betweenness centrality of that node.

Â And so, if I take every shortest path between every node and every other node.

Â 4:57

Those shortest paths will likely go through these central nodes more than they

Â will go through these peripheral nodes.

Â And so that's what we have plotted here is nodes plotted based

Â on their betweenness centrality.

Â How many of the shortest paths between any two nodes go through each of these nodes.

Â It's high for these nodes that are blue in color and

Â low for these nodes that are red in color.

Â 5:31

So, we can use use that between the centrality or

Â any of these centrality metrics to simplify a graph.

Â And so, for example, if I compute the between the centrality of edges

Â of my network of yeast proteins.

Â Edges that have low between the centrality,

Â have few shortest paths going through them.

Â And edges with high between the centrality have a lot of edges going through them.

Â And so a lot of shortest paths going through them.

Â And so if I remove edges that have few shortest paths going through them

Â you might think the impact would be less than if I have a lot of shortest paths

Â going through that edge.

Â So I've plotted the between the centrality of edges here, and I basically

Â removed the lowest between the centrality, the ones that are very dark blue and

Â left the highest between the centrality edges, the ones that are in red here.

Â And I've not removed any edge if it creates a disconnected graph.

Â So the result is a graph that

Â has as many edges as it has nodes, about 1500 edges and 1500 nodes.

Â So I guess something like a tree that's minimally connected,

Â but have retained the high between the centrality edges.

Â And removed the lowest between the centrality edges.

Â And so what's left is kind of the communications backbone,

Â the most often used edges when I'm finding the shortest path between any two nodes.

Â And that simplifies the layout.

Â Now I've got fewer edges in order to compute

Â spring distances when I use the same gem layout, force-directed layout for

Â this graph on the right than I'm using for this graph on the left.

Â The nodes spread out more because I've got fewer springs.

Â And the nodes are freer to move around to unique places.

Â And you can visually see the relationship between nodes better in this layout

Â than in this layout.

Â You're also looking at fewer edges,

Â so you could always add back in the edges as necessary,

Â to add back in those low between the centrality edges to see a more complete

Â view of this graph, but the nodes would still be positioned better than they are.

Â When you try to compute the layout using all of those edges.

Â 8:07

And we can start to group edges together

Â if we have some way of measuring the similarity of edges.

Â We can create basically wire bundles called edge bundles that simplify

Â the visualization of this graph.

Â In this case, these are emails that were part of the litigation for

Â a company called Enron between 132 employees back in 2001.

Â And here's a couple different layouts.

Â This is a force directed layout of these node positions and

Â then instead of having the edges, just go straight line.

Â Connecting each node we've tried to bundle the edges together so

Â you can kind of see main communication pathways in these representations.

Â 8:55

How do you find two edges to be similar or

Â not similar to each other so that you can bundle them?

Â Well this, for that task it's useful to form communities

Â to try to cluster nodes together in communities that are sharing

Â similar edge behaviors, similar edge connections.

Â 9:19

And so in order to find communities we can remove edges in order of decreasing

Â centrality, or decreasing between the centrality, which is what we used.

Â So instead of removing the low centrality edges like we did for

Â node layout before, now we're removing high centrality edges.

Â So now we're moving the edges that are part of the main communication pathways.

Â What that does is that disconnects a graph and it isolates nodes.

Â It isolates nodes in clumps and

Â those clumps form clusters that form communities.

Â So you can think of a community of nodes communicating through

Â another community of node, across a high between the centrality edge, and so

Â by removing our edges from the highest between the centrality to the lowest

Â centrality edges, we're creating a hierarchy of these communities.

Â As we remove those edges, we can create nodes that represent where those edges

Â 10:36

And then by merging those clumps together we can place vertices in the interior

Â to denote higher level communities.

Â These are called community notes, they're not part of the original graph.

Â They're just used to represent communities,

Â to represent mergings of clumps of nodes.

Â 10:55

And so here's some examples of Edge Bundle Communities.

Â In this case we took every football game between colleges in the United States.

Â University of Illinois is one of these colleges.

Â And by just taking, any time any college played another college,

Â we would connect that with an edge and by looking at the final graph, removing

Â the high between the centrality edges, they started to isolate communities and

Â we were able to figure out a layout that represented these low-level communities.

Â We basically rediscovered the conferences.

Â And so, just by looking at all the games played between, games of

Â football played between one university and another in the US we were able to discover

Â the conferences that organized the games between colleges in the US.

Â 12:09

So here's an example of the actual communities from Enron, and

Â here's the communities we discovered.

Â These don't lie in the same position as the rest of these.

Â For example here's where the president of Enron is and

Â he's listed here in the CEO section here.

Â But we found the same actual communities were discovered

Â by basically removing high between the centrology edges from the graph of

Â all emails from one employee to another ended up revealing these communities.

Â 12:43

And as we'll see later it's useful to be able to filter out and

Â look at details interactively in these edge bundle examples or

Â in any large data set, so we can click for example on the CEO of Enron, Kenneth Lay.

Â And see all the email that was sent in this data set to all the other employees,

Â to all the other communities.

Â For example I can click on Illinois, and

Â I can see all the other Big 10 conference games that it had, but

Â I can also see the games it had to other places outside of its conference.

Â 13:15

So there's a variety of methods that we've just seen to visualize graphs.

Â Not necessarily planar graphs, but any graph.

Â Social networks and other graphs.

Â And they're based largely on analyses that are enabled by that centrality,

Â the definition of some kind of centrality of the nodes.

Â [MUSIC]

Â