In this lecture, we'll talk about graph neural network. Here's the outline of this lecture. We will first talk about fundamentals about deep learnings on graphs and then cover a number of different graph neural networks, including graph convolutional neural networks, message passing neural network, and graph attention networks. Then we'll finish with a number of house care applications with graph neural networks. Let's start with graph neural network fundamentals. In this part, we'll introduce node embedding, different tasks on graphs, and challenges with graph neural networks, and some key ideas behind graph neural network. Node embedding. Graph neural networks is an important set of messes that apply neural networks on graph structures. Output of graph neural networks is this node embedding. The idea is we want to learn a d-dimensional vectors that represent individual node in the graph such that similar node will have similar embeddings. They will be close to each other. If you imagine in this general forms, graph neural network would take a graph as input, which has nodes and edges and some features on the node and edges, then they will output a set of embedding vectors, one for each node in the original graph. The key question is, how do we learn such embedding function that works best? What are the key components behind graph neural networks? One is this embedding function, that is, taking a graph and if you want to know a specific node and would take that node as input, then we will output a vector embedding that represent that node. That essentially is embedding function we want to do. Take a node, and map that to a vector. This process, we want to preserve this similarity function. This similarity function, the idea is, if we can measure some similarity in term of score between 0 and 1, 1 being most similar, 0 means most dissimilar. If we node this similarity function given to node and we'll output this number, then can we come up with a function that will just map the embeddings to approximate that similarity score? For example, in our product between this two embedding vector could be one simple similarity functions of between this two node. Back to this graph neural networks node embedding step. We want to learn this embedding taking a node map to a vectors and this embedding functions is actually a deep neural network, but it leverages the graph structure of this input graph. Before we talk about different algorithm to learn those embedding vectors, we first want to understand what are the different use cases or applications if we can do those node embedding. Here are a number of tasks on graph that those embedding can help. The most obvious one is node classification, that is, for example, giving a patient network and each node is a patient and edges how similar those patients are, whether they share some common diseases or taking the same set of medications and then we create an edge. We want to classify each node to either healthy or ill. That's node classification problem. You can imagine if you have an embedding for those node, those embedding vector can become a feature to classify to do this binary classification of how is he ill or ill. That's the most direct application of node embedding, that's node classification. Another one is link prediction. Right here, given a network may be a social network in this case and you know some edges between people and you want to predict whether these two users without any connection should be connected. That's the link prediction problem. You can imagine taking two node vectors between this two node and compute some similarity score. For example, in the product between this two vectors that become your prediction for whether adding this edge or a node. Community detection. Here we want to partition the graph into different clusters within clusters, then there are a lot of edges, across clusters there are fewer edges. So that different clusters becomes community. Here you can imagine if you have the node embedding, maybe the one simple way to do the community detection is to do clustering just on those node embedding. Those are the different task on individual nodes or pair of nodes or set of nodes. Sometimes we also have operation or task on the entire graph. For example, graph property prediction. Here, imagine each graph as some molecule and the node are atoms and edges are bonds, then you may want to predict the entire molecule structure, whether it is safe or toxic. So that's graph prediction problem. Even more challenging, you may want to create a new graph structure with certain properties. Or you want to improve some existing graph structure by making small modification of that graph such that this graph property can be improved. So that's graph generation problem. As you can see, there are many different tasks on graph that can benefit from a graph embedding or note embedding. In this lecture, we'll talk about different algorithm for doing those embedding using neural network. So what are the challenges of a graph neural networks and how difficult it is to apply a neural networks on graph. It's actually much more challenging than the other standard type of data that have a lot of success using neural network such as text or images. Why is graph more challenging? Well, first of all it has arbitrary size and complexity. You can have fair small graph, you can have very large graph and so that make the graph structurally very complex. Second, there's no fixed node ordering. That sounds like a very particular problem with graph. In fact, it's quiet challenging because without this node ordering, you cannot even know whether two graphs are the same just by shuffling the node IDs, then trying to figure out whether this two graph are the same or have the same structure, it become NP-hard. So that's the standard graph isomorphism problem, it's a hard problem. So without global node ordering it's hard to develop algorithm that works for arbitrary ordering of the graph. Graph are often times dynamic. For example, imagine you have a patient graph. There could be new patient added into the graph or existing patient can acquire new diseases, so edges should be added. So it's very dynamic. That's very challenging to deal with. At first, we could have heterogeneous features on the node or on the edges, it could be categorical or it could be numerical. It could be even some other modality of data like text description. So that makes the graph very challenging and especially for developing a neural network type of a method. So the bottom, we're showing that just analogy of why graph is challenging. For language or texts, you can imagine that's a line graph because the words lineup in sequence. If you connect neighboring words, you've got a line graph. Images is like two dimensional graph on this grade. So different pixel will connect to each other on this regular grid. But real graph are much more complex where you can have very highly connected node or high degree node, you can have very disconnected node with no neighbor. This is very messy that's why it's much harder to deal with graph. So one idea that we will talk about in this lecture as this convolution operation on graphs. How do we generalize this convolution on images for graph? The CNN operation, convolutional neural networks have a lot of success on images by leveraging this type of convolution operation over the entire image. If you think about image convolution, that's essentially taking regular square patches and slide that over the images and compute some kind of aggregation with some of those patch. Then generate some feature map. Then do this in multiple layers and you got a very powerful models, CNN. Can we do something like that for graphs? So that's graph convolution. You can imagine we can also have this small patch, maybe not a regular grid anymore because graphs, they have this different granularity or connections or different degrees. Maybe here we should look at neighbors for each node, then somehow aggregate information from that neighbor. So that's the key concept between graph convolution. That's aggregation from individual neighbors. We'll talk about that in details.