Add new positional encoding¶
One of the main advantage of this library is the ability to easily incorporate novel positional encodings on the node, edge and graph level. The positional encodings are computed and feed into respective encoders and then the hidden embeddings from all pe encoders are pooled (according to if they are node, edge, or graph level) and then feed into the GNN layers as features. The designs allow any combination of positional encodings to be used by modifying the configuration file. For more details on the data processing part, please visit the design page of the doc.
Here is the workflow for computing and processing positional encoding in the library:
edit related parts in the yaml configuration file
compute the raw positional encoding from the graph in
graphium/features/positional_encoding.py
(from thegraph positional encoder
)feed the raw positional encoding into the respective (specialized) encoders in
graphium/nn/encoders
. For example, a simpleMLP positional encoder
can be found.Output the hidden embeddings of pe from the encoders in their respective output keys:
feat
(node feature),edge_feat
(edge feature),graph_feat
(graph feature) and potentially other keys if needed such asnodepair_feat
pool the hidden embeddings with same keys together: for example, all output with
feat
key will be pooled togetherConstruct the
PyG Batch
, batch of graphs, each contain the output keys seen above, ready for use in the GNN layers
Since this library is built using PyG, we recommend looking at their Docs and Tutorials for more info.
We start by editing the configuration file first.
Edit the yaml Configuration File¶
Computing Raw PE¶
We will use the degree of each node as a positional encoding in this tutorial.
First start with an existing yaml configuration file, you can find them in expts/configs
We first look at where in the yaml file is the raw positional encodings computed. deg_pos
is added as an example below. You can add relevant arguments for computing the positional encoding here as well such as normalize
in the example.
pos_encoding_as_features:
pos_types:
deg_pos: #example, degree centrality
pos_type: degree
normalize: False
Specifying Encoders for the PE¶
Now we want to specify arguments for the encoders associated with the pe
pe_encoders:
out_dim: 64
pool: "sum" #choice of pooling across multiple pe encoders
last_norm: None #"batch_norm", "layer_norm"
encoders:
deg_pos: #same name from the previous cell
encoder_type: "mlp" #or you can specify your own specialized encoder
input_keys: ["degree"] #same as the pos_type configured before
output_keys: ["feat"] #node feature
hidden_dim: 64
num_layers: 1
dropout: 0.1
normalization: "none" #"batch_norm" or "layer_norm"
first_normalization: "layer_norm" #"batch_norm" or "layer_norm"
Compute the Positional Encoding¶
Next, we want to compute the raw degree of each node from the molecule graph.
add function to compute the pe¶
Go to graphium/features and add a new file deg.py
to add the function to compute the pe.
from typing import Tuple, Union, Optional
from scipy import sparse
from scipy.sparse import spmatrix
import numpy as np
def compute_deg(adj: Union[np.ndarray, spmatrix], normalize: bool) -> np.ndarray:
"""
Compute the node degree positional encoding
Parameters:
adj: Adjacency matrix
normalize: indicate if the degree across all nodes are normalized to [0,1] or not
Returns:
2D array with shape (num_nodes, 1) specifying (outgoing) degree for each node
"""
#first adj convert to scipy sparse matrix if not already
if type(adj) is np.ndarray:
adj = sparse.csr_matrix(adj)
#https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.sum.html
degs = adj.sum(axis=0) #sum over each row
if (normalize): #normalize the degree sequence to [0,1]
degs = degs / np.max(degs)
return degs
Test with toy matrix¶
here we will test if our code compute the degrees of each node correctly
adj = np.identity(5) #make an identity matrix
normalize = True
degs = compute_deg(adj, normalize=normalize)
degs
matrix([[1., 1., 1., 1., 1.]])
add to positional_encoding.py¶
To compute the new pe along with all existing pe, we need to add the function we wrote to graphium/feature/positional_encoding.py
. Modify the graph_positional_encoder
function by adding pos_type == "degree"
logic
Add Existing Encoder¶
In order to pool over all the positional encodings, we need to add encoder to process the raw computed positional encoding and ensure the output dimension from all pe encoders are the same. When designing the encoder, you can either use an existing encoder or write a specialized encoder you made
here we can simply specify MLPEncoder
in the yaml file and the library will automatically feed the raw positional encoding to a mlp encoder based on the input arguments. Note that in this example, the encoder takes in the pe stored at the input key degree
and then outputs to the output key feat
encoders:
deg_pos:
encoder_type: "mlp"
input_keys: ["degree"]
output_keys: ["feat"] # node feature
hidden_dim: 64
num_layers: 1
dropout: 0.1
normalization: "none" #"batch_norm" or "layer_norm"
first_normalization: "layer_norm" #"batch_norm" or "layer_norm"
Add Specialized Encoder¶
You can also add specialized encoder, such as laplacian_pe
for the laplacian eigenvectors and eigenvalues. Here, we can add a new deg_pos_encoder.py
in graphium/nn/encoders
. As an example and template, please see the MLPEncoder
Note that all new encoders must inherent from BaseEncoder
class and implement the following abstract methods
forward
: the forward function of the encoder, how to process the inputparse_input_keys
: how to parse the input keysparse_output_keys
: how to parse the output keys
Add the Keys to Spaces¶
In order to directly find the correct encoders from the yaml file, we need to specify which key corresponding to what class.
- add our new
deg_pos_encoder
tographium/utils/spaces.py
in thePE_ENCODERS_DICT
- add our new
deg_pos_encoder
tographium/nn/architectures/encoder_manager.py
in thePE_ENCODERS_DICT
- add the import of our encoder to
graphium/nn/encoders/__init__.py
Now we can modify the yaml file to use our new encoder
encoders:
deg_pos:
encoder_type: "deg_pos_encoder"
input_keys: ["degree"]
output_keys: ["feat"] # node feature
hidden_dim: 64
#any other keys that might be used for initialization