Add new positional encoding¶
One of the main advantage of this library is the ability to easily incorporate novel positional encodings on the node, edge and graph level. The positional encodings are computed and feed into respective encoders and then the hidden embeddings from all pe encoders are pooled (according to if they are node, edge, or graph level) and then feed into the GNN layers as features. The designs allow any combination of positional encodings to be used by modifying the configuration file. For more details on the data processing part, please visit the design page of the doc.
Here is the workflow for computing and processing positional encoding in the library:
edit related parts in the yaml configuration file
Output the hidden embeddings of pe from the encoders in their respective output keys:
graph_feat(graph feature) and potentially other keys if needed such as
pool the hidden embeddings with same keys together: for example, all output with
featkey will be pooled together
We start by editing the configuration file first.
Edit the yaml Configuration File¶
Computing Raw PE¶
We will use the degree of each node as a positional encoding in this tutorial.
First start with an existing yaml configuration file, you can find them in
We first look at where in the yaml file is the raw positional encodings computed.
deg_pos is added as an example below. You can add relevant arguments for computing the positional encoding here as well such as
normalize in the example.
pos_encoding_as_features: pos_types: deg_pos: #example, degree centrality pos_type: degree normalize: False
pe_encoders: out_dim: 64 pool: "sum" #choice of pooling across multiple pe encoders last_norm: None #"batch_norm", "layer_norm" encoders: deg_pos: #same name from the previous cell encoder_type: "mlp" #or you can specify your own specialized encoder input_keys: ["degree"] #same as the pos_type configured before output_keys: ["feat"] #node feature hidden_dim: 64 num_layers: 1 dropout: 0.1 normalization: "none" #"batch_norm" or "layer_norm" first_normalization: "layer_norm" #"batch_norm" or "layer_norm"
from typing import Tuple, Union, Optional from scipy import sparse from scipy.sparse import spmatrix import numpy as np def compute_deg(adj: Union[np.ndarray, spmatrix], normalize: bool) -> np.ndarray: """ Compute the node degree positional encoding Parameters: adj: Adjacency matrix normalize: indicate if the degree across all nodes are normalized to [0,1] or not Returns: 2D array with shape (num_nodes, 1) specifying (outgoing) degree for each node """ #first adj convert to scipy sparse matrix if not already if type(adj) is np.ndarray: adj = sparse.csr_matrix(adj) #https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.sum.html degs = adj.sum(axis=0) #sum over each row if (normalize): #normalize the degree sequence to [0,1] degs = degs / np.max(degs) return degs
adj = np.identity(5) #make an identity matrix normalize = True degs = compute_deg(adj, normalize=normalize) degs
matrix([[1., 1., 1., 1., 1.]])
Add Existing Encoder¶
In order to pool over all the positional encodings, we need to add encoder to process the raw computed positional encoding and ensure the output dimension from all pe encoders are the same. When designing the encoder, you can either use an existing encoder or write a specialized encoder you made
here we can simply specify
MLPEncoder in the yaml file and the library will automatically feed the raw positional encoding to a mlp encoder based on the input arguments. Note that in this example, the encoder takes in the pe stored at the input key
degree and then outputs to the output key
encoders: deg_pos: encoder_type: "mlp" input_keys: ["degree"] output_keys: ["feat"] # node feature hidden_dim: 64 num_layers: 1 dropout: 0.1 normalization: "none" #"batch_norm" or "layer_norm" first_normalization: "layer_norm" #"batch_norm" or "layer_norm"
Add Specialized Encoder¶
You can also add specialized encoder, such as
laplacian_pe for the laplacian eigenvectors and eigenvalues. Here, we can add a new
graphium/nn/encoders. As an example and template, please see the
Note that all new encoders must inherent from
BaseEncoder class and implement the following abstract methods
Add the Keys to Spaces¶
In order to directly find the correct encoders from the yaml file, we need to specify which key corresponding to what class.
- add our new
- add our new
- add the import of our encoder to
Now we can modify the yaml file to use our new encoder
encoders: deg_pos: encoder_type: "deg_pos_encoder" input_keys: ["degree"] output_keys: ["feat"] # node feature hidden_dim: 64 #any other keys that might be used for initialization