Add new positional encoding¶

One of the main advantage of this library is the ability to easily incorporate novel positional encodings on the node, edge and graph level. The positional encodings are computed and feed into respective encoders and then the hidden embeddings from all pe encoders are pooled (according to if they are node, edge, or graph level) and then feed into the GNN layers as features. The designs allow any combination of positional encodings to be used by modifying the configuration file. For more details on the data processing part, please visit the design page of the doc.

Here is the workflow for computing and processing positional encoding in the library:

edit related parts in the yaml configuration file
compute the raw positional encoding from the graph in graphium/features/positional_encoding.py (from the graph positional encoder)
feed the raw positional encoding into the respective (specialized) encoders in graphium/nn/encoders. For example, a simple MLP positional encoder can be found.
Output the hidden embeddings of pe from the encoders in their respective output keys: feat(node feature), edge_feat(edge feature), graph_feat(graph feature) and potentially other keys if needed such as nodepair_feat
pool the hidden embeddings with same keys together: for example, all output with feat key will be pooled together
Construct the PyG Batch, batch of graphs, each contain the output keys seen above, ready for use in the GNN layers

Since this library is built using PyG, we recommend looking at their Docs and Tutorials for more info.

We start by editing the configuration file first.

Edit the yaml Configuration File¶

Computing Raw PE¶

We will use the degree of each node as a positional encoding in this tutorial. First start with an existing yaml configuration file, you can find them in expts/configs

We first look at where in the yaml file is the raw positional encodings computed. deg_pos is added as an example below. You can add relevant arguments for computing the positional encoding here as well such as normalize in the example.

pos_encoding_as_features:
    pos_types:
      deg_pos: #example, degree centrality
        pos_type: degree
        normalize: False

Specifying Encoders for the PE¶

Now we want to specify arguments for the encoders associated with the pe

pe_encoders:
    out_dim: 64
    pool: "sum" #choice of pooling across multiple pe encoders
    last_norm: None #"batch_norm", "layer_norm"
    encoders: 
      deg_pos: #same name from the previous cell
        encoder_type: "mlp" #or you can specify your own specialized encoder
        input_keys: ["degree"] #same as the pos_type configured before
        output_keys: ["feat"] #node feature
        hidden_dim: 64
        num_layers: 1
        dropout: 0.1
        normalization: "none"   #"batch_norm" or "layer_norm"
        first_normalization: "layer_norm"   #"batch_norm" or "layer_norm"

Compute the Positional Encoding¶

Next, we want to compute the raw degree of each node from the molecule graph.

add function to compute the pe¶

Go to graphium/features and add a new file deg.py to add the function to compute the pe.

In [1]:

Copied!





from typing import Tuple, Union, Optional

from scipy import sparse
from scipy.sparse import spmatrix
import numpy as np

def compute_deg(adj: Union[np.ndarray, spmatrix], normalize: bool) -> np.ndarray:
    """
    Compute the node degree positional encoding 

    Parameters:
        adj: Adjacency matrix
        normalize: indicate if the degree across all nodes are normalized to [0,1] or not
    Returns:
        2D array with shape (num_nodes, 1) specifying (outgoing) degree for each node
    """
    
    #first adj convert to scipy sparse matrix if not already
    if type(adj) is np.ndarray:
        adj = sparse.csr_matrix(adj)
    
    #https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.sum.html
    degs = adj.sum(axis=0) #sum over each row
    
    if (normalize): #normalize the degree sequence to [0,1]
        degs = degs / np.max(degs)
    return degs
from typing import Tuple, Union, Optional

from scipy import sparse
from scipy.sparse import spmatrix
import numpy as np

def compute_deg(adj: Union[np.ndarray, spmatrix], normalize: bool) -> np.ndarray:
    """
    Compute the node degree positional encoding 

    Parameters:
        adj: Adjacency matrix
        normalize: indicate if the degree across all nodes are normalized to [0,1] or not
    Returns:
        2D array with shape (num_nodes, 1) specifying (outgoing) degree for each node
    """
    
    #first adj convert to scipy sparse matrix if not already
    if type(adj) is np.ndarray:
        adj = sparse.csr_matrix(adj)
    
    #https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.sum.html
    degs = adj.sum(axis=0) #sum over each row
    
    if (normalize): #normalize the degree sequence to [0,1]
        degs = degs / np.max(degs)
    return degs

Test with toy matrix¶

here we will test if our code compute the degrees of each node correctly

In [2]:

Copied!

adj = np.identity(5) #make an identity matrix
normalize = True

degs = compute_deg(adj, normalize=normalize)

degs
adj = np.identity(5) #make an identity matrix
normalize = True

degs = compute_deg(adj, normalize=normalize)

degs

Out[2]:

matrix([[1., 1., 1., 1., 1.]])

add to positional_encoding.py¶

To compute the new pe along with all existing pe, we need to add the function we wrote to graphium/feature/positional_encoding.py. Modify the graph_positional_encoder function by adding pos_type == "degree" logic

Add Existing Encoder¶

In order to pool over all the positional encodings, we need to add encoder to process the raw computed positional encoding and ensure the output dimension from all pe encoders are the same. When designing the encoder, you can either use an existing encoder or write a specialized encoder you made

here we can simply specify MLPEncoder in the yaml file and the library will automatically feed the raw positional encoding to a mlp encoder based on the input arguments. Note that in this example, the encoder takes in the pe stored at the input key degree and then outputs to the output key feat

encoders: 
  deg_pos: 
    encoder_type: "mlp" 
    input_keys: ["degree"] 
    output_keys: ["feat"] # node feature
    hidden_dim: 64
    num_layers: 1
    dropout: 0.1
    normalization: "none"   #"batch_norm" or "layer_norm"
    first_normalization: "layer_norm"   #"batch_norm" or "layer_norm"

Add Specialized Encoder¶

You can also add specialized encoder, such as laplacian_pe for the laplacian eigenvectors and eigenvalues. Here, we can add a new deg_pos_encoder.py in graphium/nn/encoders. As an example and template, please see the MLPEncoder

Note that all new encoders must inherent from BaseEncoder class and implement the following abstract methods

forward: the forward function of the encoder, how to process the input
parse_input_keys: how to parse the input keys
parse_output_keys: how to parse the output keys

Add the Keys to Spaces¶

In order to directly find the correct encoders from the yaml file, we need to specify which key corresponding to what class.

add our new deg_pos_encoder to graphium/utils/spaces.py in the PE_ENCODERS_DICT
add our new deg_pos_encoder to graphium/nn/architectures/encoder_manager.py in the PE_ENCODERS_DICT
add the import of our encoder to graphium/nn/encoders/__init__.py

Now we can modify the yaml file to use our new encoder

encoders: 
  deg_pos: 
    encoder_type: "deg_pos_encoder" 
    input_keys: ["degree"] 
    output_keys: ["feat"] # node feature
    hidden_dim: 64
    #any other keys that might be used for initialization

In [ ]: