# Add new positional encoding¶

One of the main advantage of this library is the ability to easily incorporate novel positional encodings on the node, edge and graph level. The positional encodings are computed and feed into respective encoders and then the hidden embeddings from all pe encoders are pooled (according to if they are node, edge, or graph level) and then feed into the GNN layers as features. The designs allow any combination of positional encodings to be used by modifying the configuration file. For more details on the data processing part, please visit the design page of the doc.

Here is the workflow for computing and processing positional encoding in the library:

edit related parts in the yaml configuration file

compute the raw positional encoding from the graph in

`graphium/features/positional_encoding.py`

(from the`graph positional encoder`

)feed the raw positional encoding into the respective (specialized) encoders in

`graphium/nn/encoders`

. For example, a simple`MLP positional encoder`

can be found.Output the hidden embeddings of pe from the encoders in their respective output keys:

`feat`

(node feature),`edge_feat`

(edge feature),`graph_feat`

(graph feature) and potentially other keys if needed such as`nodepair_feat`

pool the hidden embeddings with same keys together: for example, all output with

`feat`

key will be pooled togetherConstruct the

`PyG Batch`

, batch of graphs, each contain the output keys seen above, ready for use in the GNN layers

Since this library is built using PyG, we recommend looking at their Docs and Tutorials for more info.

We start by editing the configuration file first.

## Edit the yaml Configuration File¶

### Computing Raw PE¶

We will use the degree of each node as a positional encoding in this tutorial.
First start with an existing yaml configuration file, you can find them in `expts/configs`

We first look at where in the yaml file is the raw positional encodings computed. `deg_pos`

is added as an example below. You can add relevant arguments for computing the positional encoding here as well such as `normalize`

in the example.

```
pos_encoding_as_features:
pos_types:
deg_pos: #example, degree centrality
pos_type: degree
normalize: False
```

### Specifying Encoders for the PE¶

Now we want to specify arguments for the encoders associated with the pe

```
pe_encoders:
out_dim: 64
pool: "sum" #choice of pooling across multiple pe encoders
last_norm: None #"batch_norm", "layer_norm"
encoders:
deg_pos: #same name from the previous cell
encoder_type: "mlp" #or you can specify your own specialized encoder
input_keys: ["degree"] #same as the pos_type configured before
output_keys: ["feat"] #node feature
hidden_dim: 64
num_layers: 1
dropout: 0.1
normalization: "none" #"batch_norm" or "layer_norm"
first_normalization: "layer_norm" #"batch_norm" or "layer_norm"
```

## Compute the Positional Encoding¶

Next, we want to compute the raw degree of each node from the molecule graph.

### add function to compute the pe¶

Go to graphium/features and add a new file `deg.py`

to add the function to compute the pe.

```
from typing import Tuple, Union, Optional
from scipy import sparse
from scipy.sparse import spmatrix
import numpy as np
def compute_deg(adj: Union[np.ndarray, spmatrix], normalize: bool) -> np.ndarray:
"""
Compute the node degree positional encoding
Parameters:
adj: Adjacency matrix
normalize: indicate if the degree across all nodes are normalized to [0,1] or not
Returns:
2D array with shape (num_nodes, 1) specifying (outgoing) degree for each node
"""
#first adj convert to scipy sparse matrix if not already
if type(adj) is np.ndarray:
adj = sparse.csr_matrix(adj)
#https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.sum.html
degs = adj.sum(axis=0) #sum over each row
if (normalize): #normalize the degree sequence to [0,1]
degs = degs / np.max(degs)
return degs
```

### Test with toy matrix¶

here we will test if our code compute the degrees of each node correctly

```
adj = np.identity(5) #make an identity matrix
normalize = True
degs = compute_deg(adj, normalize=normalize)
degs
```

matrix([[1., 1., 1., 1., 1.]])

### add to positional_encoding.py¶

To compute the new pe along with all existing pe, we need to add the function we wrote to `graphium/feature/positional_encoding.py`

. Modify the `graph_positional_encoder`

function by adding `pos_type == "degree"`

logic

## Add Existing Encoder¶

In order to pool over all the positional encodings, we need to add encoder to process the raw computed positional encoding and ensure the output dimension from all pe encoders are the same. When designing the encoder, you can either use an existing encoder or write a specialized encoder you made

here we can simply specify `MLPEncoder`

in the yaml file and the library will automatically feed the raw positional encoding to a mlp encoder based on the input arguments. Note that in this example, the encoder takes in the pe stored at the input key `degree`

and then outputs to the output key `feat`

```
encoders:
deg_pos:
encoder_type: "mlp"
input_keys: ["degree"]
output_keys: ["feat"] # node feature
hidden_dim: 64
num_layers: 1
dropout: 0.1
normalization: "none" #"batch_norm" or "layer_norm"
first_normalization: "layer_norm" #"batch_norm" or "layer_norm"
```

## Add Specialized Encoder¶

You can also add specialized encoder, such as `laplacian_pe`

for the laplacian eigenvectors and eigenvalues. Here, we can add a new `deg_pos_encoder.py`

in `graphium/nn/encoders`

. As an example and template, please see the `MLPEncoder`

Note that all new encoders must inherent from `BaseEncoder`

class and implement the following abstract methods

`forward`

: the forward function of the encoder, how to process the input`parse_input_keys`

: how to parse the input keys`parse_output_keys`

: how to parse the output keys

## Add the Keys to Spaces¶

In order to directly find the correct encoders from the yaml file, we need to specify which key corresponding to what class.

- add our new
`deg_pos_encoder`

to`graphium/utils/spaces.py`

in the`PE_ENCODERS_DICT`

- add our new
`deg_pos_encoder`

to`graphium/nn/architectures/encoder_manager.py`

in the`PE_ENCODERS_DICT`

- add the import of our encoder to
`graphium/nn/encoders/__init__.py`

Now we can modify the yaml file to use our new encoder

```
encoders:
deg_pos:
encoder_type: "deg_pos_encoder"
input_keys: ["degree"]
output_keys: ["feat"] # node feature
hidden_dim: 64
#any other keys that might be used for initialization
```

```
```