[SOUND] [MUSIC] Hi again and welcome to part 3 of Visualizing Gene Expression Data using Interactive Clustergrams Built with D3.js. So in the previous lectures, I broadly explained some concepts in data visualization, specifically discussed the problem of visualizing networks and showed you a couple ways in which you can do that, and then introduced you to the JavaScript visualization library, D3.js. So in this lecture, I'll be explaining to you how we built an interactive clustergram using D3.js and some of the applications of that and give you sort of a brief overview of some of the components of building the visualization itself. So first of all, the reason we wanted to build a clustergram with D3 is that D3 allows you to make a very flexible and interactive visualization, so this is something you cannot do with a static image. And, importantly, there is no user software requirements to view or interact with the visualization. So you don't have to download any software. You do not have to understand how to use a particular piece of software to view or interact with the visualization. All you need is an up to date browser. Preferably Chrome or Mozilla or Safari. And lastly we can generate visualizations dynamically, which means that we can have user defined data and generate visualizations based on that or we can generate visualizations based on some sort of data changing in some database somewhere. So D3 basically allows you to have a very dynamic visualization, much more dynamic the way you can generate with a lot of other methods. So I started off, again, with the example that I've shown you before. The Les Miserable co-occurence matrix. So this is a really great example from my post doc. Showing how you can create an adjacency matrix that is dynamic and interactive. You can reorder it based on different attributes, but for our purposes, we need to extend this to be able to visualize significantly more data. So we need to be able to zoom in and out, and we also want to be able to view asymmetrical data, so not just the singularity matrix but a clustergram also. So I'll go through some of the modifications that were made to this example. So to the right here is a screenshot of the final visualization that we're developing and updating. That's being hosted on the blocks website, so if you follow this link it will take you to a live example of the visualization that you can actually interact with, zoom into. You can reorder. I'll open it in a new window. Let's see, so you can zoom. Double click to refresh the zoom. You can also search rows, it'll zoom in here. You can search for something else. And it will center it for you. You can click on a row or column and reorder the matrix based on the values in that row, or that column. So this can be useful if you're interested in a particular sample, let's say, and you just want to reorder everything based on that and find the highest and lowest values on that, on this sample. And there's some other components which I won't go into but you can read about in the documents, which I'll show you where those are. One other thing that I'll show here is a dendrogram color bar that when you show the view based on cluster then this color bar allows you to see the groups of your rows and columns. So sort of like a dendrogram. And you can also add interactivity, so you could click on a bar here and potentially get information on these rows. So the features that were added, like I've shown you, were in the zooming and panning, searching, a dendogram-like colorbar. More reordering options, and some other options that you can read about here on the GitHub repo that we have. So if you follow this link, this GitHub repo takes you to the visualization. There's a readme file that explains the features that our visualization has and tells you how to interact with the API to generate a visualization. And it also includes a Python script to generate the JSON necessary for the visualization from a simple tab-separated file. So for instance, the visualization that we're showing here started off as a simple Excel-like file of data and the Python script generates the JSON for the visualization. So it should be pretty clear from this how to start from a simple excel type sheet and generate one of these visualization. And from here with GitHub you can clone the code, you could contribute to the code, generate, you know, make a pole request if you're interested in making a contribution to the code repo, and basically see everything, see how it was made, and figure out, like learn from it. So, next I will give an overview of how these components were put together. So to start off building this visualization. You first append your SVG. So the SVG, something I haven't gone over previously, but D3.js is really designed to generate visualizations using the SVG. So scalable vector graphics, and your SVG component here is what can be thought of as your canvas. So you first initialize an SVG that your visualization will live in effectively and then you can add your components to this SVG. So you have to first decide how large your visualization will be, which here's, you're selecting a div where the SVG will be appended to, so it's just a section of the webpage. Append your SVG, give it an ID so you can identify it later on, and define the width and the height, and add a zoom functionality to it, which I'll explain a little bit more later. And here I'm just repositioning it within the webpage. So this is the first part of generating a SVG visualization. You have to generate the SVG and then you start putting components into the SVG. So the first component that's put in there is a background rectangle. So you do this by selecting whatever component you, the SVG you want to append the rectangle to, then you append the rectangle, give it a class background just for reference, and define the width and height. Then you control the color of the background, in this case my background is a light gray. So the overall idea for how deeply it works is that, if you draw something on your SVG first, and then second you draw something else and then that second component if you drew it in the same position as the first component, it will simply be drawn over the first component. So you can think of it as painting, where you want to start off with the background first and then paint components on top of it. Because these things will just be put one on top of the other. And it's not that easy to really change the ordering, so you kind of have to have a plan for how you're going to draw, how you're going to construct your visualization. So, then, the rows and cells of the visualization, or tiles or the matrix, are drawn in the following way. So, we start off by, we have a matrix of data here shown as the data matrix, and this is where you start to do, again, what I've shown you a little bit before of the data join where you're joining in this case, you're joining rows of the matrix. And you're generating groups and rows, so each row of your matrix will correspond to a row of this visualization. And you'll have your data of your matrix appended to the elements in this row. So the first part you do is your data join where you join the matrix. And on each of these rows, you run a row function, which then generates the actual, in this case, the red tiles of your visualization. So here you're pulling in an argument of your road data and then you're selecting all the cells that you'll draw out. You're appending your row data here so you're doing your again your data joined with this row data so now each of these red tiles has the data that you've brought in. Then this data is used to position the x position of the tile. The y position is already positioned previously. because each row, I think, up here, yeah, it shows you here where the rows are getting positioned in the y position based on the input data. So the tiles effectively know where they're going to be positioned based on this pos_x. And this scaling function positions them where they need to be. Then the opacity is determined by the value of this datum D here, and that is determined by whatever the value was in your original matrix. Then the color can switch between red and blue by default, so if your value is positive you'll be red and if it's negative you'll create a blue tile. And then you add two mouse over functions, where when you mouse over a tile it highlights the row and column label. So if you interact with the visualization, you'll notice that when you hover over any tile it'll highlight the labels which aren't shown yet. One of the final components, broadly, is adding the row and column labels. So here, it gets a little complicated, but effectively, what I'm doing is, in this case, I'm just showing you for the rows. Where I'm bringing in my row nodes, generating a group, binding the data of my row nodes to, in this case, groups and for sort of organizational purposes. Then each of these groups are positioned in the y direction based on the value here, the index here. And then I'm adding a mouse over function to each of those labeled groups so you can imagine here that each of these rows in the matrix will have a real label. And then finally the text is pulled in in this section here where it's d.name. So the name returns the text and the font size is being controlled here, and the y position is being controlled here. So this kind of gives you an overview of the large steps that are necessary to prove spacialization. So adding the zoom gets a little complicated. So I'll give you an example of, broadly, what's happening. So, for this visualization you cannot simply zoom in to the entire visualization. Because what'll happen is that if you were to just zoom into this entire SVG, you'd zoom in but your labels would move outwards. And you wouldn't be able to see your labels any longer. So what you actually need to do is allow zooming and panning of the matrix in both directions but only allow panning of your column labels and your row labels in the x and y direction, respectively, so that gives you this kind of interactivity here, where here I'm zooming in on the x direction. So the labels always remain visible, and when you get past the square state, then you also zoom in the y direction. Here. So you need to break your zoom function into three components that work on three separate large sections. So your zoom acts differently on the three different large sections of your visualization here and here's where I'm defining the zoom rules that act on the matrix and the column labels and the row labels here. I won't really go into detail here but you can look at the code on GitHub if you're interested. So one final component is that we need to be able to resize the text when you zoom in if you have a really large matrix. So even here you'll notice that the text gets larger in the rows when you zoom in. And if you zoom back out, it gets smaller. So, it's not that noticeable here, but if you have a really large matrix like, I'll show an example here, a much larger matrix. So the rows here are very narrow, because you have around 500 rows and about 100 Columns. When you start zooming in, the text gets larger. So what I've done is apply a rule that the size of the text is based on the height of the rectangles, and also prevents the text, at least the row labels, from getting cut off. So, here Is a short snippet of the code where it's basically checking if the label is larger than the allowed area for the label, and if it is, it resizes the text. If it's not, then it allows it to zoom. So this example also shows you that you can use this tool to visualize pretty large datasets so, you can pretty much go up to around 200, 300 maybe 400,000 components and the sparser, the faster the visualization will be. And there are some other details, like you can see. The column labels are rotated. So you can look at the code to see how that's done. It's a little bit complicated. There's some other components. But this gives you a broad idea of the steps that were necessary into generating this. And then also the reordering capability, this was already a function of the original example, so I basically kept it working in the same way. So the rows, columns, and the row and column labels are all positioned based on some scaling functions in D3. So to reorder, all you do is redefine the domain of your scaling function and then reapply your scaling function. So a code snippet is shown here, and the effect is that you can reorder the large matrix pretty easily, so the nice thing about D3 is that you have your data bound to these elements. So you don't have to rebind the data, so this is the entirety of my reordering. Function here and you see that nowhere in this function am I bringing in new data, the data's already bound and I'm just simply using that. I'm only redefining the scaling function that then uses the data here to reorder these rows and labels. Okay so finally I'll show you an example of an app that we're building called clustergrammer. It's available on this website here. So, clustergrammer can be used to generate a clustergram from user to fine matrix in a tab serparated format. So, follow this link. Your file has to be in this format where your first row is your column labels. Your first column is your row labels. And basically it's an Excel like format. So you first choose a file. I'll choose this file here, basically copy and paste it from an Excel file, and then you upload it. And it should give you a visualization that it generated using your tab separated file. And it has the same capabilities that you saw in the example there where you can search, reorder, and there'll be more capabilities being added. And then finally you have a permanent link to your visualization, so if you copy this link, you can come back to the visualization from elsewhere, and it's permanently stored, so you can share your visualization with others. And, yeah, I believe that's all. So thank you for your attention, and I hope you enjoy the rest of the course. [MUSIC]