An Introduction to Graph Convolutional Networks

Transcript

Hi everybody, this is Julien from AWS. In this video, I would like to discuss graph neural networks. This video is a companion to a blog post I published on Medium, and you'll find the URL in the video description. In the blog post, I start by loading graph data into Amazon Neptune, our graph database, and then I train a simple graph neural network using the Deep Graph Library, an open-source library in Python. Here, I would like to run the sample notebook and give you a few pointers on how graph neural networks actually learn. So let me switch to the notebook. Here it is. First, let's make sure we have the latest libraries. I'm using the Deep Graph Library with the PyTorch backend, and these are the versions I'm using. The next step is to load the list of edges that define the graph. As mentioned, I uploaded some data to Amazon Neptune, queried it, and exported it, which is probably what you would do in a real-life scenario. I saved the list of edges to a pickle file, which I'm loading here. This graph has 34 nodes, and we can print all the edges. We see the node IDs defining the edges. With this, we can build a graph, add the nodes and edges. We have 34 nodes and 156 edges. Using the NetworkX library, we can visualize the graph. The graph has 34 nodes, and the edges are undirected, meaning they are bi-directional. We see two heavy nodes, node 0 and node 33. The purpose of this sample is to learn how to split this graph into two groups around node 0 and node 33. You can read the blog post for the full story on why we do this. Now, let's start with the high-level architecture. We're using a network architecture called GCN, Graph Convolutional Network. The network architecture is pretty simple: a first GCN layer activated by ReLU, a second GCN layer, and then a softmax layer. The first GCN layer takes the input features for the nodes as input, applies a learning process, and outputs hidden features with a lower dimensionality. Here, as we have 34 nodes, we will have 34 features, and the hidden features have five dimensions. The second layer takes those five features as input, applies some learning, and outputs two features. These two features correspond to the two classes we want to learn. We want to split the graph into two groups around node 0 and node 33, so we need two classes. The final layer applies the softmax function to make those two features look like probabilities. Softmax ensures all the values in the vector add up to one, making them look like probabilities. Now, let's look at what the GCN layer actually does. The forward function, which is the forward propagation function, takes the graph itself and input features as input. The first thing it does is use those input features and assign them to nodes. This is a shortcut in DGL to assign each node a feature vector from the input matrix, where each line corresponds to a node and each column to a feature. For each node, we take one of those lines and set a feature in the node called H to that vector. This is done using node IDs, so node zero gets the first line, node one gets the second line, and so on. Then, the layer asks all nodes to send a message across all their outgoing edges. Since edges are bi-directional, it means all edges. Once the messages are sent, each node on the receiving end of an edge will reduce those messages, applying an operation to the information in the messages. Finally, we take the updated features for each node and apply a linear transformation to reduce dimensionality. This is how we go from 34 features to 5 to 2, to get to the two classes. What do nodes actually send in those messages? In the GCN architecture, nodes send their features across all their edges. Once all nodes have done this, each node looks at all the messages it received, containing the features of its adjacent nodes, and updates its own features to the sum of those. For example, if we have three nodes connected like this, node one will send its features to nodes two and three, and node three will send its features to node two. Node two will then update its features to the sum of the features it received from nodes one and three. In practice, it's a bit more complicated, but as a first explanation, this is enough. After the reduction, we apply a linear transformation to shrink the features to a lower dimension. So, in the first layer, features go from 34 to 5, and in the second layer, from 5 to 2. At this point, we apply softmax, and the features become probabilities. These probabilities indicate the likelihood of a node belonging to class 0 or class 1. Now, let's look at the input features and the training process. The input features are simple here. Nodes don't have properties, but in real life, they would. The input features are one-hot encoded node IDs. The input feature matrix has one line per node, and each line is the one-hot encoded node ID. The size of the input matrix is 34x34 because we have 34 nodes and are using one-hot encoded IDs. We label the nodes we know: node 0 as class 0 and node 33 as class 1. We then train the model to figure out the class for every other node. The training process is standard for PyTorch: create an optimizer, run epochs, and for each epoch, use the graph and inputs, run them through the GCN layers and the softmax layer, and apply a loss function. We use cross-entropy because it's a classification problem. We compare the predicted probabilities for the labeled nodes to their actual labels, apply backpropagation, and update the weights. This is a semi-supervised learning scenario because we only label two nodes and use the learned parameters to compute the class for all other nodes. After training, we can look at the predictions for the last epoch, get the top class for each node, and print them out. We see that nodes close to 0 are in class 0, and nodes close to 33 are in class 1. For example, node 13, which is directly connected to 33, is strongly classified as class 0 due to its connections to nodes 0, 1, and 2, which are also close to 0. That's a quick intro to graph neural networks. I hope it made sense. If you want more context, please read the blog post. If you have questions or comments, please leave them. I love questions. I'll see you next time. Bye-bye.

An Introduction to Graph Convolutional Networks

Transcript

Tags

About the Author