Huanfa Chen - huanfa.chen@ucl.ac.uk
13/12/2025




Example graph:
\[V = \{A, B, C, D, E, F, G\}\] \[E = \{(A,B), (B,C), (C,E), ...\}\]



CNN Key Properties:
Problem: Not all data lies on Euclidean space!

Facebook Link Prediction for Suggesting Friends using Social Networks
Two Styles:
Task: Predict future friendships

Problem: Predict future friendships



Adjacency matrix \[ A = \begin{pmatrix} 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 1 & 0 \end{pmatrix} \]
Feature matrix (Randomly initialised. Can be age, hobby, etc) \[ H^0 = \left( \begin{array}{ccc} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 1 & 1 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \\ 1 & 1 & 0 \end{array} \right) \]
where \(\mathcal{N}(v)\) is the set of immediate neighbours of \(v\), excluding \(v\) itself.


Layer-wise propagation:
\[H^{i} = f(H^{i-1}, A)\]
\[f(H^{i}, A) = AH^{i}\]
where:
Layer-wise propagation:
\[H^{i} = f(H^{i-1}, A)\]
\[f(H^{i}, A) = σ(AH^{i}W^{i})\]
where:


\[f(H^{(l)}, A) = \sigma\left( \hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}H^{(l)}W^{(l)}\right)\]
where:
class GCNConv(nn.Module):
def __init__(self, A, in_channels, out_channels):
super(GCNConv, self).__init__()
self.A_hat = A + torch.eye(A.size(0))
self.D = torch.diag(torch.sum(A,1))
self.D = self.D.inverse().sqrt()
self.A_hat = torch.mm(torch.mm(self.D, self.A_hat), self.D)
self.W = nn.Parameter(torch.rand(in_channels,out_channels))
def forward(self, X):
out = torch.relu(torch.mm(torch.mm(self.A_hat, X), self.W))
return out
Context (1970-1972):

X=torch.eye(A.size(0)) # Identity matrix as initial X features
# Here we are creating a network with 10 features in hidden layer and 2 in output layer (John A's group vs Mr. Hi's group)
# Initialise network
model = Net(A, X.size(0), 10, 2)
# Loss and optimizer
criterion = torch.nn.CrossEntropyLoss(ignore_index=-1)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# Training loop
for epoch in range(200):
optimizer.zero_grad()
loss = criterion(model(X), target)
loss.backward()
optimizer.step()\[f(H^{(l)}, A) = \sigma\left( \hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}H^{(l)}W^{(l)}\right)\]

Node embeddings learned during training
PyTorch Geometric (PyG):
Features:
from torch_geometric.nn import GCNConv
class GCN(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = GCNConv(num_features, 16)
self.conv2 = GCNConv(16, num_classes)
def forward(self, x, edge_index):
x = self.conv1(x, edge_index)
x = F.relu(x)
x = self.conv2(x, edge_index)
return F.log_softmax(x, dim=1)
Transductive vs inductive learning on graph. Source: https://www.mdpi.com/2220-9964/10/2/97



\[\text{AGG}_{\text{neigh}} = \text{mean}([0, 0], [1, 0], [1, 1]) = [0.66, 0.33]\]

\(h_i^{(l+1)} = \sigma \left( \sum_{j \in \mathcal{N}_i} \alpha_{ij}^{(l)} \mathbf{W}^{(l)} h_j^{(l)} \right)\)



\[h_i^{(l+1)} = \sigma \left( \frac{1}{K} \sum_{k=1}^{K} \sum_{j \in \mathcal{N}_i} \alpha_{ij}^{k} \mathbf{W}^{k} h_j^{(l)} \right)\]
Concatenate intermediate heads; average at final layer
https://www.dgl.ai/blog/2019/02/17/gat.html
Q: How powerful are GCNs really?
Key Papers:
Libraries:
© CASA | ucl.ac.uk/bartlett/casa