Skip to content

graphium.ipu

Code for adapting to run on IPU

IPU Dataloader


graphium.ipu.ipu_dataloader


Copyright (c) 2023 Valence Labs, Recursion Pharmaceuticals and Graphcore Limited.

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.


CombinedBatchingCollator

Collator object that manages the combined batch size defined as:

combined_batch_size = batch_size * device_iterations
                     * replication_factor * gradient_accumulation

This is intended to be used in combination with the poptorch.DataLoader

__call__(batch)

Stack tensors, batch the pyg graphs, and pad each tensor to be same size.

Parameters:

Name Type Description Default
batch List[Dict[str, Union[Data, Dict[str, Tensor]]]]

The batch of data, including pyg-graphs Data and labels Dict[str, Tensor] to be padded

required

Returns:

Name Type Description
out_batch Dict[str, Union[Batch, Dict[str, Tensor], Any]]

A dictionary where the graphs are batched and the labels or other Tensors are stacked

__init__(batch_size, max_num_nodes, max_num_edges, dataset_max_nodes_per_graph, dataset_max_edges_per_graph, collate_fn=None)

Parameters:

Name Type Description Default
batch_size int

mini batch size used by the model

required
max_num_nodes int

Maximum number of nodes in the batched padded graph

required
max_num_edges int

Maximum number of edges in the batched padded graph

required
dataset_max_nodes_per_graph int

Maximum number of nodes per graph in the full dataset

required
dataset_max_edges_per_graph int

Maximum number of edges per graph in the full dataset

required
collate_fn Optional[Callable]

Function used to collate (or batch) the single data or graphs together

None

IPUDataloaderOptions dataclass

This data class stores the arguments necessary to instantiate a model for the Predictor.

Parameters:

Name Type Description Default
model_class

pytorch module used to create a model

required
model_kwargs

Key-word arguments used to initialize the model from model_class.

required

Pad

Bases: BaseTransform

Data transform that applies padding to enforce consistent tensor shapes.

__init__(max_num_nodes, dataset_max_nodes_per_graph, dataset_max_edges_per_graph, max_num_edges=None, node_value=0, edge_value=0)

Parameters:

Name Type Description Default
max_num_nodes int

The maximum number of nodes for the total padded graph

required
dataset_max_nodes_per_graph

the maximum number of nodes per graph in the dataset

required
dataset_max_edges_per_graph

the maximum number of edges per graph in the dataset

required
max_num_edges Optional[int]

The maximum number of edges for the total padded graph

None
node_value float

Value to add to the node padding

0
edge_value float

Value to add to the edge padding

0
validate(data)

Validates that the input graph does not exceed the constraints that:

  • the number of nodes must be <= max_num_nodes
  • the number of edges must be <= max_num_edges

Returns:

Type Description

Tuple containing the number nodes and the number of edges

create_ipu_dataloader(dataset, ipu_dataloader_options, ipu_options=None, batch_size=1, collate_fn=None, num_workers=0, **kwargs)

Creates a poptorch.DataLoader for graph datasets Applies the mini-batching method of concatenating multiple graphs into a single graph with multiple disconnected subgraphs. See: https://pytorch-geometric.readthedocs.io/en/2.0.2/notes/batching.html

Parameters:

dataset: The torch_geometric.data.Dataset instance from which to
    load the graph examples for the IPU.
ipu_dataloader_options: The options to initialize the Dataloader for IPU
ipu_options: The poptorch.Options used by the
    poptorch.DataLoader. Will use the default options if not provided.
batch_size: How many graph examples to load in each batch
    (default: 1).
collate_fn: The function used to collate batches
**kwargs (optional): Additional arguments of :class:`poptorch.DataLoader`.

Returns:

Type Description
DataLoader

The dataloader

IPU Losses


graphium.ipu.ipu_losses


Copyright (c) 2023 Valence Labs, Recursion Pharmaceuticals and Graphcore Limited.

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.


BCELossIPU

Bases: BCELoss

A modified version of the torch.nn.BCELoss that can ignore NaNs by giving them a weight of 0. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

BCEWithLogitsLossIPU

Bases: BCEWithLogitsLoss

A modified version of the torch.nn.BCEWithLogitsLoss that can ignore NaNs by giving them a weight of 0. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

HybridCELossIPU

Bases: HybridCELoss

__init__(n_brackets, alpha=0.5)

Parameters:

Name Type Description Default
n_brackets

the number of brackets that will be used to group the regression targets. Expected to have the same size as the number of classes in the transformed regression task.

required
forward(input, target)

Parameters:

Name Type Description Default
input Tensor

(batch_size x n_classes) tensor of logits predicted for each bracket.

required
target Tensor

(batch_size) or (batch_size, 1) tensor of target brackets in {0, 1, ..., self.n_brackets}.

required

L1LossIPU

Bases: L1Loss

A modified version of the torch.nn.L1Loss that can ignore NaNs by giving them the same value for both input and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

MSELossIPU

Bases: MSELoss

A modified version of the torch.nn.MSELoss that can ignore NaNs by giving them the same value for both input and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

IPU Metrics


graphium.ipu.ipu_metrics


Copyright (c) 2023 Valence Labs, Recursion Pharmaceuticals and Graphcore Limited.

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.


NaNTensor

Bases: Tensor

Class to create and manage a NaN tensor along it's properties

The goal of the class is to override the regular tensor such that the basic operations (sum, mean, max, etc) ignore the NaNs in the input. It also supports NaNs in integer tensors (as the lowest integer possible).

get_nans: BoolTensor property

Gets the boolean Tensor containing the location of NaNs. In the case of an integer tensor, this returns where the tensor is equal to its minimal value In the case of a boolean tensor, this returns a Tensor filled with False

__lt__(other)

Stupid fix that allows the code to work with r2_score, since it requires the size to be > 2. But since self.size now returns a Tensor instead of a value, we check that all elements are > 2.

__torch_function__(func, types, args=(), kwargs=None) classmethod

This torch_function implementation wraps subclasses such that methods called on subclasses return a subclass instance instead of a torch.Tensor instance.

One corollary to this is that you need coverage for torch.Tensor methods if implementing torch_function for subclasses.

Affects the call torch.sum() as to behave the same way as NaNTensor.sum()

We recommend always calling super().__torch_function__ as the base case when doing the above.

While not mandatory, we recommend making __torch_function__ a classmethod.

argsort(dim=-1, descending=False)

Return the indices that sort the tensor, while putting all the NaNs to the end of the sorting.

max(*args, **kwargs)

Returns the max vale of a tensor whitout NaNs

mean(*args, **kwargs)

Overloads the traditional mean to ignore the NaNs

min(*args, **kwargs)

Returns the min vale of a tensor whitout NaNs

numel()

Returns the number of non-NaN elements.

size(dim)

Instead of returning the size, return the number of non-NaN elements in a specific dimension. Useful for the r2_score metric.

sum(*args, **kwargs)

Overloads the traditional sum to ignore the NaNs

accuracy_ipu(preds, target, average='micro', mdmc_average='global', threshold=0.5, top_k=None, subset_accuracy=False, num_classes=None, multiclass=None, ignore_index=None)

A modified version of the torchmetrics.functional.accuracy that can ignore NaNs by giving them the same value for both preds and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

Parameters:

Name Type Description Default
preds Tensor

Predictions from model (probabilities, logits or labels)

required
target Tensor

Ground truth labels

required
average Optional[str]

Defines the reduction that is applied. Should be one of the following:

  • 'micro' [default]: Calculate the metric globally, across all samples and classes.
  • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).
  • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).
  • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.
  • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

.. note:: What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

.. note:: If 'none' and a given class doesn't occur in the preds or target, the value for the class will be nan.

'micro'
mdmc_average Optional[str]

Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

  • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

  • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see :ref:pages/classification:input types) as the N dimension within the sample, and computing the metric for the sample based on that.

  • 'global': In this case the N and ... dimensions of the inputs (see :ref:pages/classification:input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

'global'
num_classes Optional[int]

Number of classes. Necessary for 'macro', 'weighted' and None average methods.

None
threshold float

Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

0.5
top_k Optional[int]

Number of the highest probability or logit score predictions considered finding the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

Should be left at default (None) for all other types of inputs.

None
multiclass Optional[bool]

Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter's :ref:documentation section <pages/classification:using the multiclass parameter> for a more detailed explanation and examples.

None
ignore_index Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

None
subset_accuracy bool

Whether to compute subset accuracy for multi-label and multi-dimensional multi-class inputs (has no effect for other input types).

  • For multi-label inputs, if the parameter is set to True, then all labels for each sample must be correctly predicted for the sample to count as correct. If it is set to False, then all labels are counted separately - this is equivalent to flattening inputs beforehand (i.e. preds = preds.flatten() and same for target).

  • For multi-dimensional multi-class inputs, if the parameter is set to True, then all sub-sample (on the extra axis) must be correct for the sample to be counted as correct. If it is set to False, then all sub-samples are counter separately - this is equivalent, in the case of label predictions, to flattening the inputs beforehand (i.e. preds = preds.flatten() and same for target). Note that the top_k parameter still applies in both cases, if set.

False

Raises:

Type Description
ValueError

If top_k parameter is set for multi-label inputs.

ValueError

If average is none of "micro", "macro", "weighted", "samples", "none", None.

ValueError

If mdmc_average is not one of None, "samplewise", "global".

ValueError

If average is set but num_classes is not provided.

ValueError

If num_classes is set and ignore_index is not in the range [0, num_classes).

ValueError

If top_k is not an integer larger than 0.

auroc_ipu(preds, target, num_classes=None, task=None, pos_label=None, average='macro', max_fpr=None, sample_weights=None)

A modified version of the torchmetrics.functional.auroc that can ignore NaNs by giving them the same value for both preds and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

average_precision_ipu(preds, target, num_classes=None, task=None, ignore_index=None, pos_label=None, average='macro', sample_weights=None)

A modified version of the torchmetrics.functional.average_precision that can ignore NaNs by giving them the same value for both preds and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

f1_score_ipu(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)

A modified version of the torchmetrics.functional.classification.f_beta._fbeta_compute that can ignore NaNs by giving them the same value for both preds and target. Used to calculate the f1_score on IPU with beta parameter equal to 1.0 This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

Computes f_beta metric from stat scores: true positives, false positives, true negatives, false negatives.

Parameters:

Name Type Description Default
tp

True positives

required
fp

False positives

required
tn

True negatives

required
fn

False negatives

required
beta float

The parameter beta (which determines the weight of recall in the combined score)

1.0
ignore_index Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method

None
average Optional[str]

Defines the reduction that is applied

'micro'
mdmc_average Optional[str]

Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter)

None

fbeta_score_ipu(preds, target, beta=1.0, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)

A modified version of the torchmetrics.functional.classification.f_beta._fbeta_compute that can ignore NaNs by giving them the same value for both preds and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

Parameters:

Name Type Description Default
preds Tensor

Predictions from model (probabilities, logits or labels)

required
target Tensor

Ground truth labels

required
average Optional[str]

Defines the reduction that is applied. Should be one of the following:

  • 'micro' [default]: Calculate the metric globally, across all samples and classes.
  • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).
  • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).
  • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.
  • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

.. note:: What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

.. note:: If 'none' and a given class doesn't occur in the preds or target, the value for the class will be nan.

'micro'
mdmc_average Optional[str]

Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

  • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

  • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see :ref:pages/classification:input types) as the N dimension within the sample, and computing the metric for the sample based on that.

  • 'global': In this case the N and ... dimensions of the inputs (see :ref:pages/classification:input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

None
num_classes Optional[int]

Number of classes. Necessary for 'macro', 'weighted' and None average methods.

None
threshold float

Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

0.5
top_k Optional[int]

Number of the highest probability or logit score predictions considered finding the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

Should be left at default (None) for all other types of inputs.

None
multiclass Optional[bool]

Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter's :ref:documentation section <pages/classification:using the multiclass parameter> for a more detailed explanation and examples.

None
ignore_index Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None or 'none', the score for the ignored class will be returned as nan.

None
subset_accuracy

Whether to compute subset accuracy for multi-label and multi-dimensional multi-class inputs (has no effect for other input types).

  • For multi-label inputs, if the parameter is set to True, then all labels for each sample must be correctly predicted for the sample to count as correct. If it is set to False, then all labels are counted separately - this is equivalent to flattening inputs beforehand (i.e. preds = preds.flatten() and same for target).

  • For multi-dimensional multi-class inputs, if the parameter is set to True, then all sub-sample (on the extra axis) must be correct for the sample to be counted as correct. If it is set to False, then all sub-samples are counter separately - this is equivalent, in the case of label predictions, to flattening the inputs beforehand (i.e. preds = preds.flatten() and same for target). Note that the top_k parameter still applies in both cases, if set.

required

Raises:

Type Description
ValueError

If top_k parameter is set for multi-label inputs.

ValueError

If average is none of "micro", "macro", "weighted", "samples", "none", None.

ValueError

If mdmc_average is not one of None, "samplewise", "global".

ValueError

If average is set but num_classes is not provided.

ValueError

If num_classes is set and ignore_index is not in the range [0, num_classes).

ValueError

If top_k is not an integer larger than 0.

get_confusion_matrix(preds, target, average='micro', mdmc_average='global', threshold=0.5, top_k=None, subset_accuracy=False, num_classes=None, multiclass=None, ignore_index=None)

Calculates the confusion matrix according to the specified average method.

Parameters:

Name Type Description Default
preds Tensor

Predictions from model (probabilities, logits or labels)

required
target Tensor

Ground truth labels

required
average Optional[str]

Defines the reduction that is applied. Should be one of the following:

  • 'micro' [default]: Calculate the metric globally, across all samples and classes.
  • 'macro': Calculate the metric for each class separately, and average the metrics across classes (with equal weights for each class).
  • 'weighted': Calculate the metric for each class separately, and average the metrics across classes, weighting each class by its support (tp + fn).
  • 'none' or None: Calculate the metric for each class separately, and return the metric for every class.
  • 'samples': Calculate the metric for each sample, and average the metrics across samples (with equal weights for each sample).

.. note:: What is considered a sample in the multi-dimensional multi-class case depends on the value of mdmc_average.

.. note:: If 'none' and a given class doesn't occur in the preds or target, the value for the class will be nan.

'micro'
mdmc_average Optional[str]

Defines how averaging is done for multi-dimensional multi-class inputs (on top of the average parameter). Should be one of the following:

  • None [default]: Should be left unchanged if your data is not multi-dimensional multi-class.

  • 'samplewise': In this case, the statistics are computed separately for each sample on the N axis, and then averaged over samples. The computation for each sample is done by treating the flattened extra axes ... (see :ref:pages/classification:input types) as the N dimension within the sample, and computing the metric for the sample based on that.

  • 'global': In this case the N and ... dimensions of the inputs (see :ref:pages/classification:input types) are flattened into a new N_X sample axis, i.e. the inputs are treated as if they were (N_X, C). From here on the average parameter applies as usual.

'global'
num_classes Optional[int]

Number of classes. Necessary for 'macro', 'weighted' and None average methods.

None
threshold float

Threshold for transforming probability or logit predictions to binary (0,1) predictions, in the case of binary or multi-label inputs. Default value of 0.5 corresponds to input being probabilities.

0.5
top_k Optional[int]

Number of the highest probability or logit score predictions considered finding the correct label, relevant only for (multi-dimensional) multi-class inputs. The default value (None) will be interpreted as 1 for these inputs.

Should be left at default (None) for all other types of inputs.

None
multiclass Optional[bool]

Used only in certain special cases, where you want to treat inputs as a different type than what they appear to be. See the parameter's :ref:documentation section <pages/classification:using the multiclass parameter> for a more detailed explanation and examples.

None
ignore_index Optional[int]

Integer specifying a target class to ignore. If given, this class index does not contribute to the returned score, regardless of reduction method. If an index is ignored, and average=None

None

mean_absolute_error_ipu(preds, target)

Computes mean absolute error.

Handles NaNs without reshaping tensors in order to work on IPU.

Parameters:

Name Type Description Default
preds Tensor

estimated labels

required
target Tensor

ground truth labels

required
Return

Tensor with MAE

mean_squared_error_ipu(preds, target, squared)

Computes mean squared error.

Handles NaNs without reshaping tensors in order to work on IPU.

Parameters:

Name Type Description Default
preds Tensor

estimated labels

required
target Tensor

ground truth labels

required
squared bool

returns RMSE value if set to False

required
Return

Tensor with MSE

pearson_ipu(preds, target)

Computes pearson correlation coefficient.

Handles NaNs in the target without reshaping tensors in order to work on IPU.

Parameters:

Name Type Description Default
preds

estimated scores

required
target

ground truth scores

required

precision_ipu(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)

A modified version of the torchmetrics.functional.precision that can ignore NaNs by giving them the same value for both preds and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

r2_score_ipu(preds, target, *args, **kwargs)

Computes r2 score also known as R2 Score_Coefficient Determination_:

.. math:: R^2 = 1 - rac{SS_{res}}{SS_{tot}}

where :math:SS_{res}=\sum_i (y_i - f(x_i))^2 is the sum of residual squares, and :math:SS_{tot}=\sum_i (y_i - ar{y})^2 is total sum of squares. Can also calculate adjusted r2 score given by

.. math:: R^2_{adj} = 1 - rac{(1-R^2)(n-1)}{n-k-1}

where the parameter :math:k (the number of independent regressors) should be provided as the adjusted argument. Handles NaNs without reshaping tensors in order to work on IPU.

Parameters:

Name Type Description Default
preds

estimated labels

required
target

ground truth labels

required
adjusted

number of independent regressors for calculating adjusted r2 score.

required
multioutput

Defines aggregation in the case of multiple output scores. Can be one of the following strings:

  • 'raw_values' returns full set of scores
  • 'uniform_average' scores are uniformly averaged
  • 'variance_weighted' scores are weighted by their individual variances
required

recall_ipu(preds, target, average='micro', mdmc_average=None, ignore_index=None, num_classes=None, threshold=0.5, top_k=None, multiclass=None)

A modified version of the torchmetrics.functional.recall that can ignore NaNs by giving them the same value for both preds and target. This allows it to work with compilation and IPUs since it doesn't modify the tensor's shape.

spearman_ipu(preds, target)

Computes spearman rank correlation coefficient.

Handles NaNs in the target without reshaping tensors in order to work on IPU.

Parameters:

Name Type Description Default
preds

estimated scores

required
target

ground truth scores

required

IPU Simple Lightning


graphium.ipu.ipu_simple_lightning


Copyright (c) 2023 Valence Labs, Recursion Pharmaceuticals and Graphcore Limited.

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.


IPU Utils


graphium.ipu.ipu_utils


Copyright (c) 2023 Valence Labs, Recursion Pharmaceuticals and Graphcore Limited.

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.


import_poptorch(raise_error=True)

Import poptorch and returns it. It is wrapped in a function to avoid breaking the code for non-IPU devices which did not install poptorch.

Parameters:

Name Type Description Default
raise_error

Whether to raise an error if poptorch is unavailable. If False, return None

True

Returns:

Type Description
Optional[ModuleType]

The poptorch module

ipu_options_list_to_file(ipu_opts)

Create a temporary file from a list of ipu configs, such that it can be read by poptorch.Options.loadFromFile

Parameters:

Name Type Description Default
ipu_opts Optional[List[str]]

The list configurations for the IPU, written as a list of strings to make use of poptorch.Options.loadFromFile

required

Returns: tmp_file: The temporary file of ipu configs

is_running_on_ipu()

Returns whether the current module is running on ipu. Needs to be used in the forward or backward pass.

load_ipu_options(ipu_opts, seed=None, model_name=None, gradient_accumulation=None, precision=None, ipu_inference_opts=None)

Load the IPU options from the config file.

Parameters:

Name Type Description Default
ipu_cfg

The list configurations for the IPU, written as a list of strings to make use of poptorch.Options.loadFromFile

write a temporary config gile, and read it. See Options.loadFromFile

? see the tutorial for IPU options here
https://github.com/graphcore/tutorials/tree/sdk-release-2.6/tutorials/pytorch/efficient_data_loading
? see the full documentation for ipu options here
https://docs.graphcore.ai/projects/poptorch-user-guide/en/latest/reference.html?highlight=options#poptorch.Options

minibatch size: The number of samples processed by one simple fwd/bwd pass. = # of samples in a minibatch

device iterations: A device iteration corresponds to one iteration of the training loop executed on the IPU, starting with data-loading and ending with a weight update. In this simple case, when we set n deviceIterations, the host will prepare n mini-batches in an infeed queue so the IPU can perform efficiently n iterations. = # of minibatches to be processed at a time = # of training / backward pass in this call

gradient accumulation factor: After each backward pass the gradients are accumulated together for K mini-batches. set K in the argument = # of minibatches to accumulate gradients from

replication factor: Replication describes the process of running multiple instances of the same model simultaneously on different IPUs to achieve data parallelism. If the model requires N IPUs and the replication factor is M, N x M IPUs will be necessary. = # of times the model is copied to speed up computation, each replica of the model is sent a different subset of the dataset

global batch size: In a single device iteration, many mini-batches may be processed and the resulting gradients accumulated. We call this total number of samples processed for one optimiser step the global batch size. = total number of samples processed for one optimiser step = (minibatch size x Gradient accumulation factor) x Number of replicas

required
seed Optional[int]

random seed for the IPU

None
model_name Optional[str]

Name of the model, to be used for ipu profiling

None
ipu_inference_opts Optional[List[str]]

optional IPU configuration overrides for inference. If this is provided, options in this file override those in ipu_file for inference.

None

Returns:

training_opts: IPU options for the training set.

inference_opts: IPU options for inference.
    It differs from the `training_opts` by enforcing `gradientAccumulation` to 1

IPU Wrapper


graphium.ipu.ipu_wrapper


Copyright (c) 2023 Valence Labs, Recursion Pharmaceuticals and Graphcore Limited.

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.


PredictorModuleIPU

Bases: PredictorModule

This class wraps around the PredictorModule to make it work with IPU and the IPUPluginGraphium.

convert_from_fp16(data)

Converts tensors from FP16 to FP32. Useful to convert the IPU program output data

get_num_graphs(data)

IPU specific method to compute the number of graphs in a Batch, that considers gradient accumulation, multiple IPUs and multiple device iterations. Essential to estimate throughput in graphs/s.

PyGArgsParser

Bases: ICustomArgParser

This class is responsible for converting a PyG Batch from and to a tensor of tuples. This allows PyG Batch to be used as inputs to IPU programs. Copied from poppyg repo, in the future import from the repo directly.

reconstruct(original_structure, tensor_iterator)

Create a new instance with the same class type as the original_structure. This new instance will be initialized with tensors from the provided iterator and uses the same sorted keys from the yieldTensors() implementation.

sortedTensorKeys(struct) staticmethod

Find all the keys that map to a tensor value in struct. The keys are returned in sorted order.

yieldTensors(struct)

yield every torch.Tensor in struct in sorted order

To Dense Batch


graphium.ipu.to_dense_batch


Copyright (c) 2023 Valence Labs, Recursion Pharmaceuticals and Graphcore Limited.

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.


to_dense_batch(x, batch=None, fill_value=0.0, max_num_nodes_per_graph=None, batch_size=None, drop_nodes_last_graph=False)

Given a sparse batch of node features :math:\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F} (with :math:N_i indicating the number of nodes in graph :math:i), creates a dense node feature tensor :math:\mathbf{X} \in \mathbb{R}^{B \times N_{\max} \times F} (with :math:N_{\max} = \max_i^B N_i). In addition, a mask of shape :math:\mathbf{M} \in \{ 0, 1 \}^{B \times N_{\max}} is returned, holding information about the existence of fake-nodes in the dense representation.

Parameters:

Name Type Description Default
x Tensor

Node feature matrix :math:\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}.

required
batch Optional[Tensor]

Batch vector :math:\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N, which assigns each node to a specific example. Must be ordered. (default: :obj:None)

None
fill_value float

The value for invalid entries in the resulting dense output tensor. (default: :obj:0)

0.0
max_num_nodes_per_graph Optional[int]

The size of the output node dimension. (default: :obj:None)

None
batch_size Optional[int]

The batch size. (default: :obj:None)

None
drop_nodes_last_graph

Whether to drop the nodes of the last graphs that exceed the max_num_nodes_per_graph. Useful when the last graph is a padding.

False

:rtype: (:class:Tensor, :class:BoolTensor)

to_packed_dense_batch(x, pack_from_node_idx, pack_attn_mask, fill_value=0.0, max_num_nodes_per_pack=None)

Given a sparse batch of node features :math:\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F} (with :math:N_i indicating the number of nodes in graph :math:i), creates a dense node feature tensor :math:\mathbf{X} \in \mathbb{R}^{B \times N_{\max} \times F} (with :math:N_{\max} = \max_i^B N_i). In addition, a mask of shape :math:\mathbf{M} \in \{ 0, 1 \}^{B \times N_{\max}} is returned, holding information about the existence of fake-nodes in the dense representation.

# TODO: Update docstring

Name Type Description Default
x Tensor

Node feature matrix :math:\mathbf{X} \in \mathbb{R}^{(N_1 + \ldots + N_B) \times F}.

required
batch

Batch vector :math:\mathbf{b} \in {\{ 0, \ldots, B-1\}}^N, which assigns each node to a specific example. Must be ordered. (default: :obj:None)

required
fill_value float

The value for invalid entries in the resulting dense output tensor. (default: :obj:0)

0.0
max_num_nodes_per_graph

The size of the output node dimension. (default: :obj:None)

required
batch_size

The batch size. (default: :obj:None)

required
drop_nodes_last_graph

Whether to drop the nodes of the last graphs that exceed the max_num_nodes_per_graph. Useful when the last graph is a padding.

required

:rtype: (:class:Tensor, :class:BoolTensor)

to_sparse_batch(x, mask_idx)

Reverse function of to_dense_batch

to_sparse_batch_from_packed(x, pack_from_node_idx)

Reverse function of to_packed_dense_batch