[SOUND] [MUSIC] Hi again. And welcome to Part 2 of Visualizing Gene Expression Data using Interactive Clustergrams Built with D3.js. In this part, I'll be discussing network visualizations and clustergrams. So network visualization is a common task in many fields and often networks are visualized as nodes and links, which is what we could see here. So, D3 has a force-directed graph layout for this type of visualization. In this visualization you are seeing characters from The book Les Miserables. And they're connected based on co-occurrence and book chapters. So, this data was manually carried and available on the Stanford Graph Base. And it's a date set they will be referring to a few times during this part of the lecture. So, if we follow this link we can see a live example of a visualization of this network. So, one of the most used components of D3 is this force-directed graph layout. So you can actually move the graph components, and the components are linked to one another through simulated springs, and you have basically the force at any given time is calculated using a physics inspired algorithm. You have repulsion occurring between all nodes and the nodes are attracted to each other based on some degree of gravity and also based on the spring connectivity between these nodes. So, what this allows you to see is how these characters and these nodes cluster together and form these different clusters. So, they're actually colored differently so it kind of helps you highlight what's going on. And you can see how certain characters co-occur with each other and some of these all co-occur with this character and then these by comparison all co-occur with each other. So you get some idea of the inter-connectivity and the structure to this character network. And if you hover over each node or any node you can see the character. That is actually in this node, this node is representing. And the thickness of the link represents the frequency of co-occurence in different chapters. And down here you can see how to use this layout. So it's not that Many lines of code, and you can see here the data structure for representing this network, and it's a data structure that we see a lot more of. It's a JSON structure, so it's a way of storing data. So This network visualization is very popular, but probably the main problem with this type of visualization is that when you're viewing very large networks it often results that you get a large hairball, which is very difficult to comprehend and interactively explore at all. So as an example of a large hairball network from this paper here, and I believe they're being genes and what it's showing you is that you can see there's some clustering going on. They've colored these nodes to some degree. Are the links to show a type of clustering. But it's very difficult to get any real idea of what's happening. And if this is a static picture you pretty much can't gather anything very meaningful from it. And this is the problem that exists with these types of networks. But, alternatively, networks can be visualized, and what's referred to as an adjacency matrix, or a similarity matrix. So, in this case, we're viewing the same character network from Les Miserables, and the characters are, this time, shown as rows, and columns of a matrix. So this approach has the advantage of being able to visualize as larger networks without forming a hairball. And this is because you never have any overlapping links, and that each link here is no longer a line but it's a tile or square in this matrix. So you can see If we go to the interactive example. Some interesting features that are sort of D3 specific. So here we can see the data in a matrix form. And one of the things this example's trying to make very clear is that with this type of network visualization, it's very important to order your nodes, in this case, properly. So, by changing the order from name, which was the initial order it was in, you see very little structure. So, you could see which characters are linked to one another, and if you hover over them, it highlights the two characters that are connected any opacity gives you an idea of the degree of connectivity. So the darker this square is the more chapters the characters co-occur in. But you can also reorder this matrix based on frequency so you can see characters that just very frequently co-occur with each other and characters that don't co-occur with each other. And, once again, if we put it in cluster, then we can see the clusters that form, and these clusters are colored in different colors. So this type of visualization has the advantage of, sort of, over the visual space, spreading out its information and preventing us from getting any sort of line crossings like we saw with this equivalent visualization. So here you can't really tell what is going on because there are a lot of lines crossing. But here you don't have that problem. But the visualization is, in a sense, larger and more spread out. So they're two ways of viewing the same data. And a similar visualization called a clustergram, or a clustergram or heatmap is a visualization that is designed to Show clusters in your data. So here, it's similar to the similarity matrix or our adjacency matrix, except now the rows and columns are no longer the same. So in this example visualization from this link here, it's a heat map drawn from the programming language R. And what we're viewing here are car models like Honda Civic, Toyota Corolla. And we're comparing these models based on their attributes. This case, it's gears, miles per gallon, cylinders, those kind of thing. So you could see which cars are similar to one another. And what attributes make them similar? And you can also see which attributes are similar to one another based on which cars they occur in. And here this dendagram view, as you can see hierarchical tree of relationships among these. You can see how these cars sort of are into large clusters here and then there are smaller sub-clusters that occur through this dendreon. So in this visualization, the rows and columns are generally not the same. If they're the same then you have a symmetrical matrix and you have an adjacency matrix. So I'll explain the relationship between the previous visualization and this clustergram visualization next. So shown here are the two visualizations using a bondsic network and a similarity major matrix so these two visualizations are showing the character co-occurrence. And a clustergram can really be thought of as an extended view of either the similarity networks in the sense that each of these networks is showing you the network structure of the characters so we can see clusters of characters occurring. Just like here, we see a cluster. But what these visualizations don't show you is the evidence of the actual components that are similar among these characters. So the actual book chapters that they co-occur in are not shown here, so these are hidden. They are only represented minimally, in this case the width of the length and in this case the opacity of the square. So in the other hand a clustergram really allows you to have an extended view of a network in certain cases where, in this case we have our characters shown here as columns. And our chapters shown here as rows. So now we can see which characters are similar to one another and we can see which chapters they co-occur in. So, we're effectively with a clustergram viewing two networks simultaneously. We're viewing the character network. We can see which characters are similar to one another, and we're also viewing the chapter network. We can see which chapters are similar to which other chapters based on character occurrence. So a clustergram is sort of more information dense in many cases and often cluster grams are used to visualize gene expression data and these clustergrams can be used to find clusters of genes in this case, rows or samples. So this case we're viewing deferentially expressed genes as rows here, and we're viewing samples as columns. And that's it for this lecture, next lecture i'll be discussing how and why we are building a visualization, a clustergram visualization in D3 and some of the unique advantages that are associated with this. So thank you for your attention. [MUSIC]