graphium.ipu¶
Code for adapting to run on IPU
IPU Dataloader¶
graphium.ipu.ipu_dataloader
¶
CombinedBatchingCollator
¶
Collator object that manages the combined batch size defined as:
combined_batch_size = batch_size * device_iterations
* replication_factor * gradient_accumulation
This is intended to be used in combination with the poptorch.DataLoader
__call__(batch)
¶
Stack tensors, batch the pyg graphs, and pad each tensor to be same size.
Parameters:
Name  Type  Description  Default 

batch 
List[Dict[str, Union[Data, Dict[str, Tensor]]]]

The batch of data, including pyggraphs 
required 
Returns:
Name  Type  Description 

out_batch 
Dict[str, Union[Batch, Dict[str, Tensor], Any]]

A dictionary where the graphs are batched and the labels or other Tensors are stacked 
__init__(batch_size, max_num_nodes, max_num_edges, dataset_max_nodes_per_graph, dataset_max_edges_per_graph, collate_fn=None)
¶
Parameters:
Name  Type  Description  Default 

batch_size 
int

mini batch size used by the model 
required 
max_num_nodes 
int

Maximum number of nodes in the batched padded graph 
required 
max_num_edges 
int

Maximum number of edges in the batched padded graph 
required 
dataset_max_nodes_per_graph 
int

Maximum number of nodes per graph in the full dataset 
required 
dataset_max_edges_per_graph 
int

Maximum number of edges per graph in the full dataset 
required 
collate_fn 
Optional[Callable]

Function used to collate (or batch) the single data or graphs together 
None

IPUDataloaderOptions
dataclass
¶
This data class stores the arguments necessary to instantiate a model for the Predictor.
Parameters:
Name  Type  Description  Default 

model_class 
pytorch module used to create a model 
required  
model_kwargs 
Keyword arguments used to initialize the model from 
required 
Pad
¶
Bases: BaseTransform
Data transform that applies padding to enforce consistent tensor shapes.
__init__(max_num_nodes, dataset_max_nodes_per_graph, dataset_max_edges_per_graph, max_num_edges=None, node_value=0, edge_value=0)
¶
Parameters:
Name  Type  Description  Default 

max_num_nodes 
int

The maximum number of nodes for the total padded graph 
required 
dataset_max_nodes_per_graph 
the maximum number of nodes per graph in the dataset 
required  
dataset_max_edges_per_graph 
the maximum number of edges per graph in the dataset 
required  
max_num_edges 
Optional[int]

The maximum number of edges for the total padded graph 
None

node_value 
float

Value to add to the node padding 
0

edge_value 
float

Value to add to the edge padding 
0

validate(data)
¶
Validates that the input graph does not exceed the constraints that:
 the number of nodes must be <= max_num_nodes
 the number of edges must be <= max_num_edges
Returns:
Type  Description 

Tuple containing the number nodes and the number of edges 
create_ipu_dataloader(dataset, ipu_dataloader_options, ipu_options=None, batch_size=1, collate_fn=None, num_workers=0, **kwargs)
¶
Creates a poptorch.DataLoader for graph datasets Applies the minibatching method of concatenating multiple graphs into a single graph with multiple disconnected subgraphs. See: https://pytorchgeometric.readthedocs.io/en/2.0.2/notes/batching.html
Parameters:
dataset: The torch_geometric.data.Dataset instance from which to
load the graph examples for the IPU.
ipu_dataloader_options: The options to initialize the Dataloader for IPU
ipu_options: The poptorch.Options used by the
poptorch.DataLoader. Will use the default options if not provided.
batch_size: How many graph examples to load in each batch
(default: 1).
collate_fn: The function used to collate batches
**kwargs (optional): Additional arguments of :class:`poptorch.DataLoader`.
Returns:
Type  Description 

DataLoader

The dataloader 
IPU Losses¶
graphium.ipu.ipu_losses
¶
BCELossIPU
¶
Bases: BCELoss
A modified version of the torch.nn.BCELoss
that can ignore NaNs
by giving them a weight of 0
. This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
BCEWithLogitsLossIPU
¶
Bases: BCEWithLogitsLoss
A modified version of the torch.nn.BCEWithLogitsLoss
that can ignore NaNs
by giving them a weight of 0
. This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
HybridCELossIPU
¶
Bases: HybridCELoss
__init__(n_brackets, alpha=0.5)
¶
Parameters:
Name  Type  Description  Default 

n_brackets 
the number of brackets that will be used to group the regression targets. Expected to have the same size as the number of classes in the transformed regression task. 
required 
forward(input, target)
¶
Parameters:
Name  Type  Description  Default 

input 
Tensor

(batch_size x n_classes) tensor of logits predicted for each bracket. 
required 
target 
Tensor

(batch_size) or (batch_size, 1) tensor of target brackets in {0, 1, ..., self.n_brackets}. 
required 
L1LossIPU
¶
Bases: L1Loss
A modified version of the torch.nn.L1Loss
that can ignore NaNs
by giving them the same value for both input
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
MSELossIPU
¶
Bases: MSELoss
A modified version of the torch.nn.MSELoss
that can ignore NaNs
by giving them the same value for both input
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
IPU Metrics¶
graphium.ipu.ipu_metrics
¶
NaNTensor
¶
Bases: Tensor
Class to create and manage a NaN tensor along it's properties
The goal of the class is to override the regular tensor such that the basic operations (sum, mean, max, etc) ignore the NaNs in the input. It also supports NaNs in integer tensors (as the lowest integer possible).
get_nans: BoolTensor
property
¶
Gets the boolean Tensor containing the location of NaNs.
In the case of an integer tensor, this returns where the tensor is equal to its minimal value
In the case of a boolean tensor, this returns a Tensor filled with False
__lt__(other)
¶
Stupid fix that allows the code to work with r2_score
,
since it requires the size to be > 2. But since self.size
now returns
a Tensor instead of a value, we check that all elements are > 2.
__torch_function__(func, types, args=(), kwargs=None)
classmethod
¶
This torch_function implementation wraps subclasses such that
methods called on subclasses return a subclass instance instead of
a torch.Tensor
instance.
One corollary to this is that you need coverage for torch.Tensor methods if implementing torch_function for subclasses.
Affects the call torch.sum() as to behave the same way as NaNTensor.sum()
We recommend always calling super().__torch_function__
as the base
case when doing the above.
While not mandatory, we recommend making __torch_function__
a classmethod.
argsort(dim=1, descending=False)
¶
Return the indices that sort the tensor, while putting all the NaNs to the end of the sorting.
max(*args, **kwargs)
¶
Returns the max vale of a tensor whitout NaNs
mean(*args, **kwargs)
¶
Overloads the traditional mean to ignore the NaNs
min(*args, **kwargs)
¶
Returns the min vale of a tensor whitout NaNs
numel()
¶
Returns the number of nonNaN elements.
size(dim)
¶
Instead of returning the size, return the number of nonNaN elements in
a specific dimension. Useful for the r2_score
metric.
sum(*args, **kwargs)
¶
Overloads the traditional sum to ignore the NaNs
accuracy_ipu(preds, target, average='micro', mdmc_average='global', threshold=0.5, top_k=None, subset_accuracy=False, num_classes=None, multiclass=None, ignore_index=None)
¶
A modified version of the torchmetrics.functional.accuracy
that can ignore NaNs
by giving them the same value for both preds
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
Parameters:
Name  Type  Description  Default 

preds 
Tensor

Predictions from model (probabilities, logits or labels) 
required 
target 
Tensor

Ground truth labels 
required 
average 
Optional[str]

Defines the reduction that is applied. Should be one of the following:
.. note:: What is considered a sample in the multidimensional multiclass case
depends on the value of .. note:: If 
'micro'

mdmc_average 
Optional[str]

Defines how averaging is done for multidimensional multiclass inputs (on top of the

'global'

num_classes 
Optional[int]

Number of classes. Necessary for 
None

threshold 
float

Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multilabel inputs. Default value of 0.5 corresponds to input being probabilities. 
0.5

top_k 
Optional[int]

Number of the highest probability or logit score predictions considered finding the correct label,
relevant only for (multidimensional) multiclass inputs. The
default value ( Should be left at default ( 
None

multiclass 
Optional[bool]

Used only in certain special cases, where you want to treat inputs as a different type
than what they appear to be. See the parameter's
:ref: 
None

ignore_index 
Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute
to the returned score, regardless of reduction method. If an index is ignored, and 
None

subset_accuracy 
bool

Whether to compute subset accuracy for multilabel and multidimensional multiclass inputs (has no effect for other input types).

False

Raises:
Type  Description 

ValueError

If 
ValueError

If 
ValueError

If 
ValueError

If 
ValueError

If 
ValueError

If 
auroc_ipu(preds, target, num_classes=None, task=None, pos_label=None, average='macro', max_fpr=None, sample_weights=None)
¶
A modified version of the torchmetrics.functional.auroc
that can ignore NaNs
by giving them the same value for both preds
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
average_precision_ipu(preds, target, num_classes=None, task=None, ignore_index=None, pos_label=None, average='macro', sample_weights=None)
¶
A modified version of the torchmetrics.functional.average_precision
that can ignore NaNs
by giving them the same value for both preds
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
f1_score_ipu(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)
¶
A modified version of the torchmetrics.functional.classification.f_beta._fbeta_compute
that can ignore NaNs by giving them the same value for both preds
and target
.
Used to calculate the f1_score on IPU with beta parameter equal to 1.0
This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.
Computes f_beta metric from stat scores: true positives, false positives, true negatives, false negatives.
Parameters:
Name  Type  Description  Default 

tp 
True positives 
required  
fp 
False positives 
required  
tn 
True negatives 
required  
fn 
False negatives 
required  
beta 
float

The parameter 
1.0

ignore_index 
Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method 
None

average 
Optional[str]

Defines the reduction that is applied 
'micro'

mdmc_average 
Optional[str]

Defines how averaging is done for multidimensional multiclass inputs (on top of the

None

fbeta_score_ipu(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)
¶
A modified version of the torchmetrics.functional.classification.f_beta._fbeta_compute
that can ignore NaNs by giving them the same value for both preds
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
Parameters:
Name  Type  Description  Default 

preds 
Tensor

Predictions from model (probabilities, logits or labels) 
required 
target 
Tensor

Ground truth labels 
required 
average 
Optional[str]

Defines the reduction that is applied. Should be one of the following:
.. note:: What is considered a sample in the multidimensional multiclass case
depends on the value of .. note:: If 
'micro'

mdmc_average 
Optional[str]

Defines how averaging is done for multidimensional multiclass inputs (on top of the

None

num_classes 
Optional[int]

Number of classes. Necessary for 
None

threshold 
float

Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multilabel inputs. Default value of 0.5 corresponds to input being probabilities. 
0.5

top_k 
Optional[int]

Number of the highest probability or logit score predictions considered finding the correct label,
relevant only for (multidimensional) multiclass inputs. The
default value ( Should be left at default ( 
None

multiclass 
Optional[bool]

Used only in certain special cases, where you want to treat inputs as a different type
than what they appear to be. See the parameter's
:ref: 
None

ignore_index 
Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute
to the returned score, regardless of reduction method. If an index is ignored, and 
None

subset_accuracy 
Whether to compute subset accuracy for multilabel and multidimensional multiclass inputs (has no effect for other input types).

required 
Raises:
Type  Description 

ValueError

If 
ValueError

If 
ValueError

If 
ValueError

If 
ValueError

If 
ValueError

If 
get_confusion_matrix(preds, target, average='micro', mdmc_average='global', threshold=0.5, top_k=None, subset_accuracy=False, num_classes=None, multiclass=None, ignore_index=None)
¶
Calculates the confusion matrix according to the specified average method.
Parameters:
Name  Type  Description  Default 

preds 
Tensor

Predictions from model (probabilities, logits or labels) 
required 
target 
Tensor

Ground truth labels 
required 
average 
Optional[str]

Defines the reduction that is applied. Should be one of the following:
.. note:: What is considered a sample in the multidimensional multiclass case
depends on the value of .. note:: If 
'micro'

mdmc_average 
Optional[str]

Defines how averaging is done for multidimensional multiclass inputs (on top of the

'global'

num_classes 
Optional[int]

Number of classes. Necessary for 
None

threshold 
float

Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multilabel inputs. Default value of 0.5 corresponds to input being probabilities. 
0.5

top_k 
Optional[int]

Number of the highest probability or logit score predictions considered finding the correct label,
relevant only for (multidimensional) multiclass inputs. The
default value ( Should be left at default ( 
None

multiclass 
Optional[bool]

Used only in certain special cases, where you want to treat inputs as a different type
than what they appear to be. See the parameter's
:ref: 
None

ignore_index 
Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute
to the returned score, regardless of reduction method. If an index is ignored, and 
None

mean_absolute_error_ipu(preds, target)
¶
Computes mean absolute error.
Handles NaNs without reshaping tensors in order to work on IPU.
Parameters:
Name  Type  Description  Default 

preds 
Tensor

estimated labels 
required 
target 
Tensor

ground truth labels 
required 
Return
Tensor with MAE
mean_squared_error_ipu(preds, target, squared)
¶
Computes mean squared error.
Handles NaNs without reshaping tensors in order to work on IPU.
Parameters:
Name  Type  Description  Default 

preds 
Tensor

estimated labels 
required 
target 
Tensor

ground truth labels 
required 
squared 
bool

returns RMSE value if set to False 
required 
Return
Tensor with MSE
pearson_ipu(preds, target)
¶
Computes pearson correlation coefficient.
Handles NaNs in the target without reshaping tensors in order to work on IPU.
Parameters:
Name  Type  Description  Default 

preds 
estimated scores 
required  
target 
ground truth scores 
required 
precision_ipu(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)
¶
A modified version of the torchmetrics.functional.precision
that can ignore NaNs
by giving them the same value for both preds
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
r2_score_ipu(preds, target, *args, **kwargs)
¶
Computes r2 score also known as R2 Score_Coefficient Determination
_:
.. math:: R^2 = 1  rac{SS_{res}}{SS_{tot}}
where :math:SS_{res}=\sum_i (y_i  f(x_i))^2
is the sum of residual squares, and
:math:SS_{tot}=\sum_i (y_i  ar{y})^2
is total sum of squares. Can also calculate
adjusted r2 score given by
.. math:: R^2_{adj} = 1  rac{(1R^2)(n1)}{nk1}
where the parameter :math:k
(the number of independent regressors) should
be provided as the adjusted
argument.
Handles NaNs without reshaping tensors in order to work on IPU.
Parameters:
Name  Type  Description  Default 

preds 
estimated labels 
required  
target 
ground truth labels 
required  
adjusted 
number of independent regressors for calculating adjusted r2 score. 
required  
multioutput 
Defines aggregation in the case of multiple output scores. Can be one of the following strings:

required 
recall_ipu(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)
¶
A modified version of the torchmetrics.functional.recall
that can ignore NaNs
by giving them the same value for both preds
and target
.
This allows it to work with compilation
and IPUs since it doesn't modify the tensor's shape.
spearman_ipu(preds, target)
¶
Computes spearman rank correlation coefficient.
Handles NaNs in the target without reshaping tensors in order to work on IPU.
Parameters:
Name  Type  Description  Default 

preds 
estimated scores 
required  
target 
ground truth scores 
required 
IPU Simple Lightning¶
graphium.ipu.ipu_simple_lightning
¶
IPU Utils¶
graphium.ipu.ipu_utils
¶
import_poptorch(raise_error=True)
¶
Import poptorch and returns it. It is wrapped in a function to avoid breaking the code for nonIPU devices which did not install poptorch.
Parameters:
Name  Type  Description  Default 

raise_error 
Whether to raise an error if poptorch is unavailable.
If 
True

Returns:
Type  Description 

Optional[ModuleType]

The poptorch module 
ipu_options_list_to_file(ipu_opts)
¶
Create a temporary file from a list of ipu configs, such that it can be read by poptorch.Options.loadFromFile
Parameters:
Name  Type  Description  Default 

ipu_opts 
Optional[List[str]]

The list configurations for the IPU, written as a list of strings to make use of 
required 
Returns: tmp_file: The temporary file of ipu configs
is_running_on_ipu()
¶
Returns whether the current module is running on ipu.
Needs to be used in the forward
or backward
pass.
load_ipu_options(ipu_opts, seed=None, model_name=None, gradient_accumulation=None, precision=None, ipu_inference_opts=None)
¶
Load the IPU options from the config file.
Parameters:
Name  Type  Description  Default 

ipu_cfg 
The list configurations for the IPU, written as a list of strings to make use of write a temporary config gile, and read it. See ? see the tutorial for IPU options here¶https://github.com/graphcore/tutorials/tree/sdkrelease2.6/tutorials/pytorch/efficient_data_loading¶? see the full documentation for ipu options here¶https://docs.graphcore.ai/projects/poptorchuserguide/en/latest/reference.html?highlight=options#poptorch.Options¶minibatch size: The number of samples processed by one simple fwd/bwd pass. = # of samples in a minibatch device iterations: A device iteration corresponds to one iteration of the training loop executed on the IPU, starting with dataloading and ending with a weight update. In this simple case, when we set n deviceIterations, the host will prepare n minibatches in an infeed queue so the IPU can perform efficiently n iterations. = # of minibatches to be processed at a time = # of training / backward pass in this call gradient accumulation factor: After each backward pass the gradients are accumulated together for K minibatches. set K in the argument = # of minibatches to accumulate gradients from replication factor: Replication describes the process of running multiple instances of the same model simultaneously on different IPUs to achieve data parallelism. If the model requires N IPUs and the replication factor is M, N x M IPUs will be necessary. = # of times the model is copied to speed up computation, each replica of the model is sent a different subset of the dataset global batch size: In a single device iteration, many minibatches may be processed and the resulting gradients accumulated. We call this total number of samples processed for one optimiser step the global batch size. = total number of samples processed for one optimiser step = (minibatch size x Gradient accumulation factor) x Number of replicas 
required  
seed 
Optional[int]

random seed for the IPU 
None

model_name 
Optional[str]

Name of the model, to be used for ipu profiling 
None

ipu_inference_opts 
Optional[List[str]]

optional IPU configuration overrides for inference.
If this is provided, options in this file override those in 
None

Returns:
training_opts: IPU options for the training set.
inference_opts: IPU options for inference.
It differs from the `training_opts` by enforcing `gradientAccumulation` to 1
IPU Wrapper¶
graphium.ipu.ipu_wrapper
¶
PredictorModuleIPU
¶
Bases: PredictorModule
This class wraps around the PredictorModule
to make it work with IPU and the IPUPluginGraphium
.
convert_from_fp16(data)
¶
Converts tensors from FP16 to FP32. Useful to convert the IPU program output data
get_num_graphs(data)
¶
IPU specific method to compute the number of graphs in a Batch, that considers gradient accumulation, multiple IPUs and multiple device iterations. Essential to estimate throughput in graphs/s.
PyGArgsParser
¶
Bases: ICustomArgParser
This class is responsible for converting a PyG Batch from and to a tensor of tuples. This allows PyG Batch to be used as inputs to IPU programs. Copied from poppyg repo, in the future import from the repo directly.
reconstruct(original_structure, tensor_iterator)
¶
Create a new instance with the same class type as the original_structure. This new instance will be initialized with tensors from the provided iterator and uses the same sorted keys from the yieldTensors() implementation.
sortedTensorKeys(struct)
staticmethod
¶
Find all the keys that map to a tensor value in struct. The keys are returned in sorted order.
yieldTensors(struct)
¶
yield every torch.Tensor in struct in sorted order
To Dense Batch¶
graphium.ipu.to_dense_batch
¶
to_dense_batch(x, batch=None, fill_value=0.0, max_num_nodes_per_graph=None, batch_size=None, drop_nodes_last_graph=False)
¶
Given a sparse batch of node features
:math:\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}
(with
:math:N_i
indicating the number of nodes in graph :math:i
), creates a
dense node feature tensor
:math:\mathbf{X} \in \mathbb{R}^{B \times N_{\max} \times F}
(with
:math:N_{\max} = \max_i^B N_i
).
In addition, a mask of shape :math:\mathbf{M} \in \{ 0, 1 \}^{B \times
N_{\max}}
is returned, holding information about the existence of
fakenodes in the dense representation.
Parameters:
Name  Type  Description  Default 

x 
Tensor

Node feature matrix
:math: 
required 
batch 
Optional[Tensor]

Batch vector
:math: 
None

fill_value 
float

The value for invalid entries in the
resulting dense output tensor. (default: :obj: 
0.0

max_num_nodes_per_graph 
Optional[int]

The size of the output node dimension.
(default: :obj: 
None

batch_size 
Optional[int]

The batch size. (default: :obj: 
None

drop_nodes_last_graph 
Whether to drop the nodes of the last graphs that exceed
the 
False

:rtype: (:class:Tensor
, :class:BoolTensor
)
to_packed_dense_batch(x, pack_from_node_idx, pack_attn_mask, fill_value=0.0, max_num_nodes_per_pack=None)
¶
Given a sparse batch of node features
:math:\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}
(with
:math:N_i
indicating the number of nodes in graph :math:i
), creates a
dense node feature tensor
:math:\mathbf{X} \in \mathbb{R}^{B \times N_{\max} \times F}
(with
:math:N_{\max} = \max_i^B N_i
).
In addition, a mask of shape :math:\mathbf{M} \in \{ 0, 1 \}^{B \times
N_{\max}}
is returned, holding information about the existence of
fakenodes in the dense representation.
# TODO: Update docstring
Name  Type  Description  Default 

x 
Tensor

Node feature matrix
:math: 
required 
batch 
Batch vector
:math: 
required  
fill_value 
float

The value for invalid entries in the
resulting dense output tensor. (default: :obj: 
0.0

max_num_nodes_per_graph 
The size of the output node dimension.
(default: :obj: 
required  
batch_size 
The batch size. (default: :obj: 
required  
drop_nodes_last_graph 
Whether to drop the nodes of the last graphs that exceed
the 
required 
:rtype: (:class:Tensor
, :class:BoolTensor
)
to_sparse_batch(x, mask_idx)
¶
Reverse function of to_dense_batch
to_sparse_batch_from_packed(x, pack_from_node_idx)
¶
Reverse function of to_packed_dense_batch