Skip to content

graphium.nn.architectures

High level architectures in the library

Global Architectures


graphium.nn.architectures.global_architectures

FeedForwardGraph

Bases: FeedForwardNN

__init__(in_dim, out_dim, hidden_dims, layer_type, depth=None, activation='relu', last_activation='none', dropout=0.0, last_dropout=0.0, normalization='none', first_normalization='none', last_normalization='none', residual_type='none', residual_skip_steps=1, in_dim_edges=0, hidden_dims_edges=[], name='GNN', layer_kwargs=None, virtual_node='none', use_virtual_edges=False, last_layer_is_readout=False)

A flexible neural network architecture, with variable hidden dimensions, support for multiple layer types, and support for different residual connections.

This class is meant to work with different graph neural networks layers. Any layer must inherit from graphium.nn.base_graph_layer.BaseGraphStructure or graphium.nn.base_graph_layer.BaseGraphLayer.

Parameters:

Name Type Description Default
in_dim int

Input feature dimensions of the layer

required
out_dim int

Output feature dimensions of the layer

required
hidden_dims Union[List[int], int]

List of dimensions in the hidden layers. Be careful, the "simple" residual type only supports hidden dimensions of the same value.

required
layer_type Union[str, nn.Module]

Type of layer to use. Can be a string or nn.Module.

required
depth Optional[int]

If hidden_dims is an integer, depth is 1 + the number of hidden layers to use. If hidden_dims is a list, depth must be None.

None
activation Union[str, Callable]

activation function to use in the hidden layers.

'relu'
last_activation Union[str, Callable]

activation function to use in the last layer.

'none'
dropout float

The ratio of units to dropout. Must be between 0 and 1

0.0
last_dropout float

The ratio of units to dropout for the last layer. Must be between 0 and 1

0.0
normalization Union[str, Callable]

Normalization to use. Choices:

  • "none" or None: No normalization
  • "batch_norm": Batch normalization
  • "layer_norm": Layer normalization
  • Callable: Any callable function
'none'
first_normalization Union[str, Callable]

Whether to use batch normalization before the first layer

'none'
last_normalization Union[str, Callable]

Whether to use batch normalization in the last layer

'none'
residual_type str
  • "none": No residual connection
  • "simple": Residual connection similar to the ResNet architecture. See class ResidualConnectionSimple
  • "weighted": Residual connection similar to the Resnet architecture, but with weights applied before the summation. See class ResidualConnectionWeighted
  • "concat": Residual connection where the residual is concatenated instead of being added.
  • "densenet": Residual connection where the residual of all previous layers are concatenated. This leads to a strong increase in the number of parameters if there are multiple hidden layers.
'none'
residual_skip_steps int

The number of steps to skip between each residual connection. If 1, all the layers are connected. If 2, half of the layers are connected.

1
in_dim_edges int

Input edge-feature dimensions of the network. Keep at 0 if not using edge features, or if the layer doesn't support edges.

0
hidden_dims_edges List[int]

Hidden dimensions for the edges. Most models don't support it, so it should only be used for those that do, i.e. GatedGCNLayer

[]
name str

Name attributed to the current network, for display and printing purposes.

'GNN'
layer_type Union[str, nn.Module]

The type of layers to use in the network. A class that inherits from graphium.nn.base_graph_layer.BaseGraphStructure, or one of the following strings

  • "pyg:gin": GINConvPyg
  • "pyg:gine": GINEConvPyg
  • "pyg:gated-gcn": GatedGCNPyg
  • "pyg:pna-msgpass": PNAMessagePassingPyg
required
layer_kwargs Optional[Dict]

The arguments to be used in the initialization of the layer provided by layer_type

None
virtual_node str

A string associated to the type of virtual node to use, either None, "none", "mean", "sum", "max", "logsum". See graphium.nn.pooling_pyg.VirtualNode.

The virtual node will not use any residual connection if residual_type is "none". Otherwise, it will use a simple ResNet like residual connection.

'none'
use_virtual_edges bool

A bool flag used to select if the virtual node should use the edges or not

False
last_layer_is_readout bool

Whether the last layer should be treated as a readout layer. Allows to use the mup.MuReadout from the muTransfer method https://github.com/microsoft/mup

False
__repr__()

Controls how the class is printed

forward(g)

Apply the full graph neural network on the input graph and node features.

Parameters:

Name Type Description Default
g Batch

pyg Batch graph on which the convolution is done with the keys:

  • "feat": torch.Tensor[..., N, Din] Node feature tensor, before convolution. N is the number of nodes, Din is the input features

  • "edge_feat" (torch.Tensor[..., N, Ein]): Edge feature tensor, before convolution. N is the number of nodes, Ein is the input edge features

required

Returns:

Type Description
torch.Tensor

torch.Tensor[..., M, Dout] or torch.Tensor[..., N, Dout]: Node or graph feature tensor, after the network. N is the number of nodes, M is the number of graphs, Dout is the output dimension self.out_dim If the self.pooling is [None], then it returns node features and the output dimension is N, otherwise it returns graph features and the output dimension is M

get_init_kwargs()

Get a dictionary that can be used to instanciate a new object with identical parameters.

make_mup_base_kwargs(divide_factor=2.0, factor_in_dim=False)

Create a 'base' model to be used by the mup or muTransfer scaling of the model. The base model is usually identical to the regular model, but with the layers width divided by a given factor (2 by default)

Parameter

divide_factor: Factor by which to divide the width. factor_in_dim: Whether to factor the input dimension for the nodes

Returns:

Name Type Description
kwargs Dict[str, Any]

Dictionary of parameters to be used to instanciate the base model divided by the factor

FeedForwardNN

Bases: nn.Module, MupMixin

__init__(in_dim, out_dim, hidden_dims, depth=None, activation='relu', last_activation='none', dropout=0.0, last_dropout=0.0, normalization='none', first_normalization='none', last_normalization='none', residual_type='none', residual_skip_steps=1, name='LNN', layer_type='fc', layer_kwargs=None, last_layer_is_readout=False)

A flexible neural network architecture, with variable hidden dimensions, support for multiple layer types, and support for different residual connections.

Parameters:

Name Type Description Default
in_dim int

Input feature dimensions of the layer

required
out_dim int

Output feature dimensions of the layer

required
hidden_dims Union[List[int], int]

Either an integer specifying all the hidden dimensions, or a list of dimensions in the hidden layers. Be careful, the "simple" residual type only supports hidden dimensions of the same value.

required
depth Optional[int]

If hidden_dims is an integer, depth is 1 + the number of hidden layers to use. If hidden_dims is a list, then depth must be None or equal to len(hidden_dims) + 1

None
activation Union[str, Callable]

activation function to use in the hidden layers.

'relu'
last_activation Union[str, Callable]

activation function to use in the last layer.

'none'
dropout float

The ratio of units to dropout. Must be between 0 and 1

0.0
last_dropout float

The ratio of units to dropout for the last_layer. Must be between 0 and 1

0.0
normalization Union[str, Callable]

Normalization to use. Choices:

  • "none" or None: No normalization
  • "batch_norm": Batch normalization
  • "layer_norm": Layer normalization
  • Callable: Any callable function
'none'
first_normalization Union[str, Callable]

Whether to use batch normalization before the first layer

'none'
last_normalization Union[str, Callable]

Whether to use batch normalization in the last layer

'none'
residual_type str
  • "none": No residual connection
  • "simple": Residual connection similar to the ResNet architecture. See class ResidualConnectionSimple
  • "weighted": Residual connection similar to the Resnet architecture, but with weights applied before the summation. See class ResidualConnectionWeighted
  • "concat": Residual connection where the residual is concatenated instead of being added.
  • "densenet": Residual connection where the residual of all previous layers are concatenated. This leads to a strong increase in the number of parameters if there are multiple hidden layers.
'none'
residual_skip_steps int

The number of steps to skip between each residual connection. If 1, all the layers are connected. If 2, half of the layers are connected.

1
name str

Name attributed to the current network, for display and printing purposes.

'LNN'
layer_type Union[str, nn.Module]

The type of layers to use in the network. Either "fc" as the FCLayer, or a class representing the nn.Module to use.

'fc'
layer_kwargs Optional[Dict]

The arguments to be used in the initialization of the layer provided by layer_type

None
last_layer_is_readout bool

Whether the last layer should be treated as a readout layer. Allows to use the mup.MuReadout from the muTransfer method https://github.com/microsoft/mup

False
__repr__()

Controls how the class is printed

forward(h)

Apply the neural network on the input features.

Parameters:

Name Type Description Default
h torch.Tensor

torch.Tensor[..., Din]: Input feature tensor, before the network. Din is the number of input features

required

Returns:

Type Description
torch.Tensor

torch.Tensor[..., Dout]: Output feature tensor, after the network. Dout is the number of output features

get_init_kwargs()

Get a dictionary that can be used to instanciate a new object with identical parameters.

make_mup_base_kwargs(divide_factor=2.0, factor_in_dim=False)

Create a 'base' model to be used by the mup or muTransfer scaling of the model. The base model is usually identical to the regular model, but with the layers width divided by a given factor (2 by default)

Parameter

divide_factor: Factor by which to divide the width. factor_in_dim: Whether to factor the input dimension

FullGraphMultiTaskNetwork

Bases: nn.Module, MupMixin

in_dim: int property

Returns the input dimension of the network

in_dim_edges: int property

Returns the input edge dimension of the network

out_dim: int property

Returns the output dimension of the network

out_dim_edges: int property

Returns the output dimension of the edges of the network.

__init__(gnn_kwargs, pre_nn_kwargs=None, pre_nn_edges_kwargs=None, pe_encoders_kwargs=None, task_heads_kwargs=None, graph_output_nn_kwargs=None, accelerator_kwargs=None, num_inference_to_average=1, last_layer_is_readout=False, name='FullGNN')

Class that allows to implement a full graph neural network architecture, including the pre-processing MLP and the post processing MLP.

Parameters:

Name Type Description Default
gnn_kwargs Dict[str, Any]

key-word arguments to use for the initialization of the pre-processing GNN network using the class FeedForwardGraph. It must respect the following criteria:

  • gnn_kwargs["in_dim"] must be equal to pre_nn_kwargs["out_dim"]
  • gnn_kwargs["out_dim"] must be equal to graph_output_nn_kwargs["in_dim"]
required
pe_encoders_kwargs Optional[Dict[str, Any]]

key-word arguments to use for the initialization of all positional encoding encoders. See the class EncoderManager for more details.

None
pre_nn_kwargs Optional[Dict[str, Any]]

key-word arguments to use for the initialization of the pre-processing MLP network of the node features before the GNN, using the class FeedForwardNN. If None, there won't be a pre-processing MLP.

None
pre_nn_edges_kwargs Optional[Dict[str, Any]]

key-word arguments to use for the initialization of the pre-processing MLP network of the edge features before the GNN, using the class FeedForwardNN. If None, there won't be a pre-processing MLP.

None
task_heads_kwargs Optional[Dict[str, Any]]

This argument is a list of dictionaries containing the arguments for task heads. Each argument is used to initialize a task-specific MLP.

None
graph_output_nn_kwargs Optional[Dict[str, Any]]

This argument is a list of dictionaries corresponding to the arguments for a FeedForwardNN. Each dict of arguments is used to initialize a shared MLP.

None
accelerator_kwargs Optional[Dict[str, Any]]

key-word arguments specific to the accelerator being used, e.g. pipeline split points

None
num_inference_to_average int

Number of inferences to average at val/test time. This is used to avoid the noise introduced by positional encodings with sign-flips. In case no such encoding is given, this parameter is ignored. NOTE: The inference time will be slowed-down proportionaly to this parameter.

1
last_layer_is_readout bool

Whether the last layer should be treated as a readout layer. Allows to use the mup.MuReadout from the muTransfer method https://github.com/microsoft/mup

False
name str

Name attributed to the current network, for display and printing purposes.

'FullGNN'
__repr__()

Controls how the class is printed

forward(g)

Apply the pre-processing neural network, the graph neural network, and the post-processing neural network on the graph features.

Parameters:

Name Type Description Default
g Batch

pyg Batch graph on which the convolution is done. Must contain the following elements:

  • Node key "feat": torch.Tensor[..., N, Din]. Input node feature tensor, before the network. N is the number of nodes, Din is the input features dimension self.pre_nn.in_dim

  • Edge key "edge_feat": torch.Tensor[..., N, Ein] Optional. The edge features to use. It will be ignored if the model doesn't supporte edge features or if self.in_dim_edges==0.

  • Other keys related to positional encodings "pos_enc_feats_sign_flip", "pos_enc_feats_no_flip".

required

Returns:

Type Description
Tensor

torch.Tensor[..., M, Dout] or torch.Tensor[..., N, Dout]: Node or graph feature tensor, after the network. N is the number of nodes, M is the number of graphs, Dout is the output dimension self.graph_output_nn.out_dim If the self.gnn.pooling is [None], then it returns node features and the output dimension is N, otherwise it returns graph features and the output dimension is M

make_mup_base_kwargs(divide_factor=2.0)

Create a 'base' model to be used by the mup or muTransfer scaling of the model. The base model is usually identical to the regular model, but with the layers width divided by a given factor (2 by default)

Parameter

divide_factor: Factor by which to divide the width.

Returns:

Type Description
Dict[str, Any]

Dictionary with the kwargs to create the base model.

set_max_num_nodes_edges_per_graph(max_nodes, max_edges)

Set the maximum number of nodes and edges for all gnn layers and encoder layers

Parameters:

Name Type Description Default
max_nodes Optional[int]

Maximum number of nodes in the dataset. This will be useful for certain architecture, but ignored by others.

required
max_edges Optional[int]

Maximum number of edges in the dataset. This will be useful for certain architecture, but ignored by others.

required

GraphOutputNN

Bases: nn.Module, MupMixin

concat_last_layers: Optional[Iterable[int]] property writable

Property to control the output of the self.forward. If set to a list of integer, the forward function will concatenate the output of different layers.

If set to None, the output of the last layer is returned.

NOTE: The indexes are inverted. 0 is the last layer, 1 is the second last, etc.

out_dim: int property

Returns the output dimension of the network

__init__(in_dim, in_dim_edges, task_level, graph_output_nn_kwargs)

Parameters:

Name Type Description Default
in_dim int

Input feature dimensions of the layer

required
in_dim_edges int

Input edge feature dimensions of the layer

required
task_level str

graph/node/edge/nodepair depending on wether it is graph/node/edge/nodepair level task

required
graph_output_nn_kwargs Dict[str, Any]

key-word arguments to use for the initialization of the post-processing MLP network after the GNN, using the class FeedForwardNN.

required
compute_nodepairs(node_feats, batch, max_num_nodes=None, fill_value=float('nan'), batch_size=None, drop_nodes_last_graph=False)

Vectorized implementation of nodepair-level task:

Parameters:

Name Type Description Default
node_feats torch.Tensor

Node features

required
batch torch.Tensor

Batch vector

required
max_num_nodes int

The maximum number of nodes per graph

None
fill_value float

The value for invalid entries in the resulting dense output tensor. (default: :obj:NaN)

float('nan')
batch_size int

The batch size. (default: :obj:None)

None
drop_nodes_last_graph bool

Whether to drop the nodes of the last graphs that exceed the max_num_nodes_per_graph. Useful when the last graph is a padding.

False

Returns:

Name Type Description
result torch.Tensor

concatenated node features of shape B * max_num_nodes * 2*h,

torch.Tensor

where B is number of graphs, max_num_nodes is the chosen maximum number nodes, and h is the feature dim

drop_graph_output_nn_layers(num_layers_to_drop)

Remove the last layers of the model. Useful for Transfer Learning.

Parameters:

Name Type Description Default
num_layers_to_drop int

The number of layers to drop from the self.graph_output_nn network.

required
extend_graph_output_nn_layers(layers)

Add layers at the end of the model. Useful for Transfer Learning.

Parameters:

Name Type Description Default
layers nn.ModuleList

A ModuleList of all the layers to extend

required
forward(g)

Parameters:

Name Type Description Default
g Batch

pyg Batch graph

required

Returns:

Name Type Description
h

Output features after applying graph_output_nn

make_mup_base_kwargs(divide_factor=2.0, factor_in_dim=False)

Create a 'base' model to be used by the mup or muTransfer scaling of the model. The base model is usually identical to the regular model, but with the layers width divided by a given factor (2 by default)

Parameter

divide_factor: Factor by which to divide the width. factor_in_dim: Whether to factor the input dimension

Returns:

Type Description
Dict[str, Any]

Dictionary with the kwargs to create the base model.

set_max_num_nodes_edges_per_graph(max_nodes, max_edges)

Set the maximum number of nodes and edges for all gnn layers and encoder layers

Parameters:

Name Type Description Default
max_nodes Optional[int]

Maximum number of nodes in the dataset. This will be useful for certain architecture, but ignored by others.

required
max_edges Optional[int]

Maximum number of edges in the dataset. This will be useful for certain architecture, but ignored by others.

required

TaskHeads

Bases: nn.Module, MupMixin

out_dim: Dict[str, int] property

Returns the output dimension of each task head

__init__(in_dim, in_dim_edges, task_heads_kwargs, graph_output_nn_kwargs, last_layer_is_readout=True)

Class that groups all multi-task output heads together to provide the task-specific outputs.

Parameters:

Name Type Description Default
in_dim int

Input feature dimensions of the layer

required
in_dim_edges int

Input edge feature dimensions of the layer

required
last_layer_is_readout bool

Whether the last layer should be treated as a readout layer. Allows to use the mup.MuReadout from the muTransfer method

True
task_heads_kwargs Dict[str, Any]

This argument is a list of dictionaries corresponding to the arguments for a FeedForwardNN. Each dict of arguments is used to initialize a task-specific MLP.

required
graph_output_nn_kwargs Dict[str, Any]

key-word arguments to use for the initialization of the post-processing MLP network after the GNN, using the class FeedForwardNN.

required
__repr__()

Returns a string representation of the task heads

forward(g)

forward function of the task head

Parameters:

Name Type Description Default
g Batch

pyg Batch graph

required

Returns:

Name Type Description
task_head_outputs Dict[str, torch.Tensor]

Return a dictionary: Dict[task_name, Tensor]

make_mup_base_kwargs(divide_factor=2.0, factor_in_dim=False)

Create a 'base' model to be used by the mup or muTransfer scaling of the model. The base model is usually identical to the regular model, but with the layers width divided by a given factor (2 by default)

Parameter

divide_factor: Factor by which to divide the width. factor_in_dim: Whether to factor the input dimension

Returns:

Name Type Description
kwargs Dict[str, Any]

Dictionary of arguments to be used to initialize the base model

set_max_num_nodes_edges_per_graph(max_nodes, max_edges)

Set the maximum number of nodes and edges for all gnn layers and encoder layers

Parameters:

Name Type Description Default
max_nodes Optional[int]

Maximum number of nodes in the dataset. This will be useful for certain architecture, but ignored by others.

required
max_edges Optional[int]

Maximum number of edges in the dataset. This will be useful for certain architecture, but ignored by others.

required

PyG Architectures


graphium.nn.architectures.pyg_architectures

FeedForwardPyg

Encoder Manager


graphium.nn.architectures.encoder_manager

EncoderManager

Bases: nn.Module

in_dims: Iterable[int] property

Returns the input dimensions for all pe-encoders

Returns:

Name Type Description
in_dims Iterable[int]

the input dimensions for all pe-encoders

input_keys: Iterable[str] property

Returns the input keys for all pe-encoders

Returns:

Name Type Description
input_keys Iterable[str]

the input keys for all pe-encoders

out_dim: int property

Returns the output dimension of the pooled embedding from all the pe encoders

Returns:

Name Type Description
out_dim int

the output dimension of the pooled embedding from all the pe encoders

__init__(pe_encoders_kwargs=None, max_num_nodes_per_graph=None, name='encoder_manager')

Class that allows to runs multiple encoders in parallel and concatenate / pool their outputs.

Parameters:

Name Type Description Default
pe_encoders_kwargs Optional[Dict[str, Any]]

key-word arguments to use for the initialization of all positional encoding encoders can use the class PE_ENCODERS_DICT: "la_encoder"(tested) , "mlp_encoder" (not tested), "signnet_encoder" (not tested)

None
name str

Name attributed to the current network, for display and printing purposes.

'encoder_manager'
forward(g)

forward pass of the pe encoders and pooling

Parameters:

Name Type Description Default
g Batch

ptg Batch on which the convolution is done. Must contain the following elements:

  • Node key "feat": torch.Tensor[..., N, Din]. Input node feature tensor, before the network. N is the number of nodes, Din is the input features dimension self.pre_nn.in_dim

  • Edge key "edge_feat": torch.Tensor[..., N, Ein] Optional. The edge features to use. It will be ignored if the model doesn't supporte edge features or if self.in_dim_edges==0.

  • Other keys related to positional encodings "pos_enc_feats_sign_flip", "pos_enc_feats_no_flip".

required

Returns:

Name Type Description
g Batch

pyg Batch with the positional encodings added to the graph

forward_positional_encoding(g)

Forward pass for the positional encodings (PE), with each PE having it's own encoder defined in self.pe_encoders. All the positional encodings with the same keys are pooled together using self.pe_pooling.

Parameters:

Name Type Description Default
g Batch

pyg Batch containing the node positional encodings

required

Returns:

Name Type Description
pe_node_pooled Dict[str, Tensor]

The positional / structural encodings go through

Dict[str, Tensor]

encoders, then are pooled together according to their keys.

forward_simple_pooling(h, pooling, dim)

Apply sum, mean, or max pooling on a Tensor.

Parameters:

Name Type Description Default
h Tensor

the Tensor to pool

required
pooling str

string specifiying the pooling method

required
dim int

the dimension to pool over

required

Returns:

Name Type Description
pooled Tensor

the pooled Tensor

make_mup_base_kwargs(divide_factor=2.0)

Create a 'base' model to be used by the mup or muTransfer scaling of the model. The base model is usually identical to the regular model, but with the layers width divided by a given factor (2 by default)

Parameter

divide_factor: Factor by which to divide the width.

Returns:

Name Type Description
pe_kw Dict[str, Any]

the model kwargs where the dimensions are divided by the factor