graphium.features¶

Feature extraction and manipulation

Contents

Featurizer
Positional Encoding
Properties
Spectral PE
Random Walk PE
NMP

Featurizer¶

`graphium.features.featurizer` ¶

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.

`GraphDict` ¶

Bases: dict

`init(dic)` ¶

Store the parameters required to initialize a pyg.data.Data, but as a dictionary to reduce memory consumption.

Possible keys for the dictionary:

adj: A sparse Tensor containing the adjacency matrix
ndata: A dictionnary containing different keys and Tensors associated to the node features.
edata: A dictionnary containing different keys and Tensors associated to the edge features.
dtype: The dtype for the floating data.
mask_nan: Deal with molecules that fail a part of the featurization. NaNs can happen when taking the of a noble gas, or other properties that are not measured for specific atoms.
- "raise": Raise an error when there is a nan or inf in the featurization
- "warn": Raise a warning when there is a nan or inf in the featurization
- "None": DEFAULT. Don't do anything
- "Floating value": Replace nans or inf by the specified value

`make_pyg_graph(**kwargs)` ¶

Convert the current dictionary of parameters, containing an adjacency matrix with node/edge data into a pyg.data.Data of torch Tensors.

**kwargs can be used to overwrite any parameter from the current dictionary. See GraphDict.__init__ for a list of parameters

`get_estimated_bond_length(bond, mol)` ¶

Estimate the bond length between atoms by looking at the estimated atomic radius that depends both on the atom type and the bond type. The resulting bond-length is then the sum of the radius.

Keep in mind that this function only provides an estimate of the bond length and not the true one based on a conformer. The vast majority od estimated bond lengths will have an error below 5% while some bonds can have an error up to 20%. This function is mostly useful when conformer generation fails for some molecules, or for increased computation speed.

Parameters:

Name	Type	Description	Default
`bond`	`Bond`	The bond to measure its lenght	required
`mol`	`Mol`	The molecule containing the bond (used to get neighbouring atoms)	required

Returns:

Name	Type	Description
`bond_length`	`float`	The bond length in Angstrom, typically a value around 1-2.

`get_mol_atomic_features_float(mol, property_list, offset_carbon=True, mask_nan='raise')` ¶

Get a dictionary of floating-point arrays of atomic properties. To ensure all properties are at a similar scale, some of the properties are divided by a constant.

There is also the possibility of offseting by the carbon value using the offset_carbon parameter.

Parameters:

mol:
    molecule from which to extract the properties

property_list:
    A list of atomic properties to get from the molecule, such as 'atomic-number',
    'mass', 'valence', 'degree', 'electronegativity'.
    Some elements are divided by a factor to avoid feature explosion.

    Accepted properties are:

    - "atomic-number"
    - "mass", "weight"
    - "valence", "total-valence"
    - "implicit-valence"
    - "hybridization"
    - "chirality"
    - "hybridization"
    - "aromatic"
    - "ring", "in-ring"
    - "min-ring"
    - "max-ring"
    - "num-ring"
    - "degree"
    - "radical-electron"
    - "formal-charge"
    - "vdw-radius"
    - "covalent-radius"
    - "electronegativity"
    - "ionization", "first-ionization"
    - "melting-point"
    - "metal"
    - "single-bond"
    - "aromatic-bond"
    - "double-bond"
    - "triple-bond"
    - "is-carbon"
    - "group"
    - "period"

offset_carbon:
    Whether to subract the Carbon property from the desired atomic property.
    For example, if we want the mass of the Lithium (6.941), the mass of the
    Carbon (12.0107) will be subracted, resulting in a value of -5.0697

mask_nan:
    Deal with molecules that fail a part of the featurization.
    NaNs can happen when taking the of a noble gas,
    or other properties that are not measured for specific atoms.

    - "raise": Raise an error when there is a nan or inf in the featurization
    - "warn": Raise a warning when there is a nan or inf in the featurization
    - "None": DEFAULT. Don't do anything
    - "Floating value": Replace nans or inf by the specified value

Returns:

prop_dict:
    A dictionnary where the element of ``property_list`` are the keys
    and the values are np.ndarray of shape (N,). N is the number of atoms
    in ``mol``.

`get_mol_atomic_features_onehot(mol, property_list)` ¶

Get the following set of features for any given atom

One-hot representation of the atom
One-hot representation of the atom degree
One-hot representation of the atom implicit valence
One-hot representation of the the atom hybridization
Whether the atom is aromatic
The atom's formal charge
The atom's number of radical electrons

Additionally, the following features can be set, depending on the value of input Parameters

One-hot representation of the number of hydrogen atom in the the current atom neighborhood if explicit_H is false
One-hot encoding of the atom chirality, and whether such configuration is even possible

Parameters:

mol:
    molecule from which to extract the properties

property_list:
    A list of integer atomic properties to get from the molecule.
    The integer values are converted to a one-hot vector.
    Callables are not supported by this function.

    Accepted properties are:

    - "atomic-number"
    - "degree"
    - "valence", "total-valence"
    - "implicit-valence"
    - "hybridization"
    - "chirality"
    - "phase"
    - "type"
    - "group"
    - "period"

Returns:

Name	Type	Description
`prop_dict`	`Dict[str, Tensor]`	A dictionnary where the element of `property_list` are the keys and the values are np.ndarray of shape (N, OH). N is the number of atoms in `mol` and OH the lenght of the one-hot encoding.

`get_mol_conformer_features(mol, property_list, mask_nan=None)` ¶

obtain the conformer features of a molecule Parameters:

mol:
    molecule from which to extract the properties

property_list:
    A list of conformer property to get from the molecule
    Accepted properties are:
    - "positions_3d"

Returns:

Name	Type	Description
`prop_dict`	`Dict[str, ndarray]`	a dictionary where the element of `property_list` are the keys

`get_mol_edge_features(mol, property_list, mask_nan='raise')` ¶

Get the following set of features for any given bond See graphium.features.nmp for allowed values in one hot encoding

One-hot representation of the bond type. Note that you should not kekulize your molecules, if you expect this to take aromatic bond into account.
Bond stereo type, following CIP classification
Whether the bond is conjugated
Whether the bond is in a ring

Parameters:

Name	Type	Description	Default
`mol`	`Mol`	rdkit.Chem.Molecule the molecule of interest	required
`property_list`	`List[str]`	A list of edge properties to return for the given molecule. Accepted properties are: "bond-type-onehot" "bond-type-float" "stereo" "in-ring" "conjugated" "conformer-bond-length" (might cause problems with complex molecules) "estimated-bond-length"	required

Returns:

Name	Type	Description
`prop_dict`	`Dict[str, ndarray]`	A dictionnary where the element of `property_list` are the keys and the values are np.ndarray of shape (N,). N is the number of atoms in `mol`.

`get_simple_mol_conformer(mol)` ¶

If the molecule has a conformer, then it will return the conformer at idx 0. Otherwise, it generates a simple molecule conformer using rdkit.Chem.rdDistGeom.EmbedMolecule and returns it. This is meant to be used in simple functions like GetBondLength, not in functions requiring complex 3D structure.

Parameters:

mol: Rdkit Molecule

Returns:

Name	Type	Description
`conf`	`Union[Conformer, None]`	A conformer of the molecule, or `None` if it fails

`mol_to_adj_and_features(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, mask_nan='raise')` ¶

Transforms a molecule into an adjacency matrix representing the molecular graph and a set of atom and bond features.

It also returns the positional encodings associated to the graph.

Parameters:

mol:
    The molecule to be converted

atom_property_list_onehot:
    List of the properties used to get one-hot encoding of the atom type,
    such as the atom index represented as a one-hot vector.
    See function `get_mol_atomic_features_onehot`

atom_property_list_float:
    List of the properties used to get floating-point encoding of the atom type,
    such as the atomic mass or electronegativity.
    See function `get_mol_atomic_features_float`

conformer_property_list:
    list of properties used to encode the conformer information, outside of atom properties, currently support "positions_3d"

edge_property_list:
    List of the properties used to encode the edges, such as the edge type
    and the stereo type.

add_self_loop:
    Whether to add a value of `1` on the diagonal of the adjacency matrix.

explicit_H:
    Whether to consider the Hydrogens explicitely. If `False`, the hydrogens
    are implicit.

use_bonds_weights:
    Whether to use the floating-point value of the bonds in the adjacency matrix,
    such that single bonds are represented by 1, double bonds 2, triple 3, aromatic 1.5

pos_encoding_as_features: keyword arguments for function `graph_positional_encoder`
    to generate positional encoding for node features.

dtype:
    The torch data type used to build the graph

mask_nan:
    Deal with molecules that fail a part of the featurization.
    NaNs can happen when taking the of a noble gas,
    or other properties that are not measured for specific atoms.

    - "raise": Raise an error when there is a nan or inf in the featurization
    - "warn": Raise a warning when there is a nan or inf in the featurization
    - "None": DEFAULT. Don't do anything
    - "Floating value": Replace nans or inf by the specified value

Returns:

adj:
    torch coo sparse adjacency matrix of the molecule

ndata:
    Concatenated node data of the atoms, based on the properties from
    `atom_property_list_onehot` and `atom_property_list_float`.
    If no properties are given, it returns `None`

edata:
    Concatenated node edge of the molecule, based on the properties from
    `edge_property_list`.
    If no properties are given, it returns `None`

pe_dict:
    Dictionary of all positional encodings. Current supported keys:

    - "pos_enc_feats_sign_flip":
        Node positional encoding that requires augmentation via sign-flip.
        For example, eigenvectors of the Laplacian are ambiguous to the
        sign and are returned here.

    - "pos_enc_feats_no_flip":
        Node positional encoding that requires does not use sign-flip.
        For example, distance from centroid are returned here.

    - "rwse":
        Node structural encoding corresponding to the diagonal of the random
        walk matrix

conf_dict:
    contains the 3d positions of a conformer of the molecule or 0s if none is found

`mol_to_adjacency_matrix(mol, use_bonds_weights=False, add_self_loop=False, dtype=np.float32)` ¶

Convert a molecule to a sparse adjacency matrix, as a torch Tensor. Instead of using the Rdkit GetAdjacencyMatrix() method, this method uses the bond ordering from the molecule object, which is the same as the bond ordering in the bond features.

Warning

Do not use Tensor.coalesce() on the returned adjacency matrix, as it will change the ordering of the bonds.

Parameters:

Name	Type	Description	Default
`mol`	`Mol`	A molecule in the form of a SMILES string or an RDKit molecule object.	required
`use_bonds_weights`	`bool`	If `True`, the adjacency matrix will contain the bond type as the value of the edge. If `False`, the adjacency matrix will contain `1` as the value of the edge.	`False`
`add_self_loop`	`bool`	If `True`, the adjacency matrix will contain a self-loop for each node.	`False`
`dtype`	`dtype`	The data type used to build the graph	`float32`

Returns:

Name	Type	Description
`adj`	`coo_matrix`	coo sparse adjacency matrix of the molecule

`mol_to_graph_dict(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, on_error='ignore', mask_nan='raise', max_num_atoms=None)` ¶

Transforms a molecule into an adjacency matrix representing the molecular graph and a set of atom and bond features, and re-organizes them into a dictionary that allows to build a pyg.data.Data object.

Compared to mol_to_pyggraph, this function does not build the graph directly, and is thus faster, less memory heavy, and compatible with other frameworks.

Parameters:

mol:
    The molecule to be converted

atom_property_list_onehot:
    List of the properties used to get one-hot encoding of the atom type,
    such as the atom index represented as a one-hot vector.
    See function `get_mol_atomic_features_onehot`

atom_property_list_float:
    List of the properties used to get floating-point encoding of the atom type,
    such as the atomic mass or electronegativity.
    See function `get_mol_atomic_features_float`

conformer_property_list:
    list of properties used to encode the conformer information, outside of atom properties, currently support "positions_3d"

edge_property_list:
    List of the properties used to encode the edges, such as the edge type
    and the stereo type.

add_self_loop:
    Whether to add a value of `1` on the diagonal of the adjacency matrix.

explicit_H:
    Whether to consider the Hydrogens explicitely. If `False`, the hydrogens
    are implicit.

use_bonds_weights:
    Whether to use the floating-point value of the bonds in the adjacency matrix,
    such that single bonds are represented by 1, double bonds 2, triple 3, aromatic 1.5

pos_encoding_as_features: keyword arguments for function `graph_positional_encoder`
    to generate positional encoding for node features.

dtype:
    The numpy data type used to build the graph

on_error:
    What to do when the featurization fails. This can change the
    behavior of `mask_nan`.

    - "raise": Raise an error
    - "warn": Raise a warning and return a string of the error
    - "ignore": Ignore the error and return a string of the error

mask_nan:
    Deal with molecules that fail a part of the featurization.
    NaNs can happen when taking the of a noble gas,
    or other properties that are not measured for specific atoms.

    - "raise": Raise an error when there is a nan or inf in the featurization
    - "warn": Raise a warning when there is a nan or inf in the featurization
    - "None": DEFAULT. Don't do anything
    - "Floating value": Replace nans or inf by the specified value

max_num_atoms:
    Maximum number of atoms for a given molecule. If a molecule with more atoms
    is give, an error is raised, but catpured according to the rules of
    `on_error`.

Returns:

graph_dict:
    A dictionary `GraphDict` containing the keys required to build a graph,
    and which can be used to build a PyG graph. If it fails
    to featurize the molecule, it returns a string with the error.

    - "adj": A sparse int-array containing the adjacency matrix

    - "data": A dictionnary containing different keys and numpy
      arrays associated to the (node, edge & graph) features.

    - "dtype": The numpy dtype for the floating data.

`mol_to_graph_signature(featurizer_args=None)` ¶

Get the default arguments of mol_to_graph_dict and update it with a provided dict of arguments in order to get a fulle signature of the featurizer args actually used for the features computation.

Parameters:

Name	Type	Description	Default
`featurizer_args`	`Dict[str, Any]`	A dictionary of featurizer arguments to update	`None`

Returns: A dictionary of featurizer arguments

`mol_to_pyggraph(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, on_error='ignore', mask_nan='raise', max_num_atoms=None)` ¶

Transforms a molecule into an adjacency matrix representing the molecular graph and a set of atom and bond features.

Then, the adjacency matrix and node/edge features are used to build a pyg.data.Data with pytorch Tensors.

Parameters:

mol:
    The molecule to be converted

atom_property_list_onehot:
    List of the properties used to get one-hot encoding of the atom type,
    such as the atom index represented as a one-hot vector.
    See function `get_mol_atomic_features_onehot`

atom_property_list_float:
    List of the properties used to get floating-point encoding of the atom type,
    such as the atomic mass or electronegativity.
    See function `get_mol_atomic_features_float`

conformer_property_list:
    list of properties used to encode the conformer information, outside of atom properties, currently support "positions_3d"

edge_property_list:
    List of the properties used to encode the edges, such as the edge type
    and the stereo type.

add_self_loop:
    Whether to add a value of `1` on the diagonal of the adjacency matrix.

explicit_H:
    Whether to consider the Hydrogens explicitely. If `False`, the hydrogens
    are implicit.

use_bonds_weights:
    Whether to use the floating-point value of the bonds in the adjacency matrix,
    such that single bonds are represented by 1, double bonds 2, triple 3, aromatic 1.5

pos_encoding_as_features: keyword arguments for function `graph_positional_encoder`
    to generate positional encoding for node features.

dtype:
    The numpy data type used to build the graph

on_error:
    What to do when the featurization fails. This can change the
    behavior of `mask_nan`.

    - "raise": Raise an error
    - "warn": Raise a warning and return a string of the error
    - "ignore": Ignore the error and return a string of the error

mask_nan:
    Deal with molecules that fail a part of the featurization.
    NaNs can happen when taking the of a noble gas,
    or other properties that are not measured for specific atoms.

    - "raise": Raise an error when there is a nan in the featurization
    - "warn": Raise a warning when there is a nan in the featurization
    - "None": DEFAULT. Don't do anything
    - "Floating value": Replace nans by the specified value

max_num_atoms:
    Maximum number of atoms for a given molecule. If a molecule with more atoms
    is give, an error is raised, but catpured according to the rules of
    `on_error`.

Returns:

graph:
    Pyg graph, with `graph['feat']` corresponding to the concatenated
    node data from `atom_property_list_onehot` and `atom_property_list_float`,
    `graph['edge_feat']` corresponding to the concatenated edge data from `edge_property_list`.
    There are also additional entries for the positional encodings.

`to_dense_array(array, dtype=None)` ¶

Assign the node data Parameters: array: The array to convert to dense dtype: The dtype of the array Returns: The dense array

`to_dense_tensor(tensor, dtype=None)` ¶

Assign the node data Parameters: array: The array to convert to dense dtype: The dtype of the array Returns: The dense array

Positional Encoding¶

`graphium.features.positional_encoding` ¶

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.

`get_all_positional_encodings(adj, num_nodes, pos_kwargs=None)` ¶

Get features positional encoding.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix of the graph	required
`num_nodes`	`int`	Number of nodes in the graph	required
`pos_encoding_as_features`		keyword arguments for function `graph_positional_encoder` to generate positional encoding for node features.	required

Returns:

Name	Type	Description
`pe_dict`	`Tuple[OrderedDict[str, ndarray]]`	Dictionary of positional and structural encodings

`graph_positional_encoder(adj, num_nodes, pos_type=None, pos_level=None, pos_kwargs=None, cache=None)` ¶

Get a positional encoding that depends on the parameters.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix of the graph	required
`num_nodes`	`int`	Number of nodes in the graph	required
`pos_type`	`Optional[str]`	The type of positional encoding to use. If None, it must be provided by `pos_kwargs["pos_type"]`. Supported types are: - laplacian_eigvec \ - laplacian_eigval \ -> cache connected comps. & eigendecomp. - rwse - electrostatic \ - commute \ -> cache pinvL - graphormer	`None`
`pos_level`	`Optional[str]`	Positional level to output. If None, it must be provided by `pos_kwargs["pos_level"]`. - node - edge - nodepair - graph	`None`
`pos_kwargs`	`Optional[Dict[str, Any]]`	Extra keyword arguments for the positional encoding. Can include the keys pos_type and pos_level.	`None`
`cache`	`Optional[Dict[str, Any]]`	Dictionary of cached objects	`None`

Returns:

Name	Type	Description
`pe`	`Dict[str, ndarray]`	Positional or structural encoding
`cache`	`Dict[str, Any]`	Updated dictionary of cached objects

Properties¶

`graphium.features.properties` ¶

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.

`get_prop_or_none(prop, n, *args, **kwargs)` ¶

return properties. If error, return list of None with lenght n. Parameters: prop: The property to compute. n: The number of elements in the property. args: The arguments to pass to the property. *kwargs: The keyword arguments to pass to the property. Returns: The property or a list of None with lenght n.

`get_props_from_mol(mol, properties='autocorr3d')` ¶

Function to get a given set of desired properties from a molecule, and output a property list.

Parameters:

Name	Type	Description	Default
`mol`	`Union[Mol, str]`	The molecule from which to compute the properties.	required
`properties`	`Union[List[str], str]`	The list of properties to compute for each molecule. It can be the following: 'descriptors' 'autocorr3d' 'rdf' 'morse' 'whim' 'all'	`'autocorr3d'`

Returns:

Name	Type	Description
`props`	`ndarray`	np.array(float) The array of properties for the desired molecule
`classes_start_idx`	`ndarray`	list(int) The list of index specifying the start of each new class of descriptor or property. For example, if props has 20 elements, the first 5 are rotatable bonds, the next 8 are morse, and the rest are whim, then `classes_start_idx = [0, 5, 13]`. This will mainly be useful to normalize the features of each class.
`classes_names`	`ndarray`	list(str) The name of the classes associated to each starting index. Will be usefull to understand what property is the network learning.

Spectral PE¶

`graphium.features.spectral` ¶

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.

`compute_laplacian_pe(adj, num_pos, cache, disconnected_comp=True, normalization='none')` ¶

Compute the Laplacian eigenvalues and eigenvectors of the Laplacian of the graph.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix of the graph	required
`num_pos`	`int`	Number of Laplacian eigenvectors to compute	required
`cache`	`Dict[str, Any]`	Dictionary of cached objects	required
`disconnected_comp`	`bool`	Whether to compute the eigenvectors for each connected component	`True`
`normalization`	`str`	Normalization to apply to the Laplacian	`'none'`

Returns:

Name	Type	Description
	`ndarray`	Two possible outputs: eigvals [num_nodes, num_pos]: Eigenvalues of the Laplacian repeated for each node. This repetition is necessary in case of disconnected components, where the eigenvalues of the Laplacian are not the same for each node. eigvecs [num_nodes, num_pos]: Eigenvectors of the Laplacian
`base_level`	`str`	Indicator of the output pos_level (node, edge, nodepair, graph) -> here node
`cache`	`Dict[str, Any]`	Updated dictionary of cached objects

`normalize_matrix(matrix, degree_vector=None, normalization=None)` ¶

Normalize a given matrix using its degree vector

Parameters¶

matrix: torch.tensor(N, N) or scipy.sparse.spmatrix(N, N)
    A square matrix representing either an Adjacency matrix or a Laplacian.

degree_vector: torch.tensor(N) or np.ndarray(N) or None
    A vector representing the degree of ``matrix``.
    ``None`` is only accepted if ``normalization==None``

normalization: str or None, Default='none'
    Normalization to use on the eig_matrix

    - 'none' or ``None``: no normalization

    - 'sym': Symmetric normalization ``D^-0.5 L D^-0.5``

    - 'inv': Inverse normalization ``D^-1 L``

Returns¶

matrix: torch.tensor(N, N) or scipy.sparse.spmatrix(N, N)
    The normalized matrix

Random Walk PE¶

`graphium.features.rw` ¶

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.

`compute_rwse(adj, ksteps, num_nodes, cache, pos_type='rw_return_probs' or 'rw_transition_probs', space_dim=0)` ¶

Compute Random Walk Spectral Embedding (RWSE) for given list of K steps.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix	required
`ksteps`	`Union[int, List[int]]`	List of numbers of steps for the random walks. If int, a list is generated from 1 to ksteps.	required
`num_nodes`	`int`	Number of nodes in the graph	required
`cache`	`Dict[str, Any]`	Dictionary of cached objects	required
`pos_type`	`str`	Desired output	`'rw_return_probs' or 'rw_transition_probs'`
`space_dim`	`int`	Estimated dimensionality of the space. Used to correct the random-walk diagonal by a factor `k^(space_dim/2)`. In euclidean space, this correction means that the height of the gaussian distribution stays almost constant across the number of steps, if `space_dim` is the dimension of the euclidean space.	`0`

Returns: Two possible outputs: rw_return_probs [num_nodes, len(ksteps)]: Random-Walk k-step landing probabilities rw_transition_probs [num_nodes, num_nodes, len(ksteps)]: Random-Walk k-step transition probabilities base_level: Indicator of the output pos_level (node, edge, nodepair, graph) -> here either node or nodepair cache: Updated dictionary of cached objects

`get_Pks(ksteps, edge_index, edge_weight=None, num_nodes=None, start_Pk=None, start_k=None)` ¶

Compute Random Walk landing probabilities for given list of K steps.

Parameters:

Name	Type	Description	Default
`ksteps`	`List[int]`	List of numbers of k-steps for which to compute the RW landings	required
`edge_index`	`Tuple[Tensor, Tensor]`	PyG sparse representation of the graph	required
`edge_weight`	`Optional[Tensor]`	Edge weights	`None`
`num_nodes`	`Optional[int]`	Number of nodes in the graph	`None`

Returns:

Type	Description
`Dict[int, ndarray]`	2D Tensor with shape (num_nodes, len(ksteps)) with RW landing probs

NMP¶

`graphium.features.nmp` ¶

Use of this software is subject to the terms and conditions outlined in the LICENSE file. Unauthorized modification, distribution, or use is prohibited. Provided 'as is' without warranties of any kind.

Valence Labs, Recursion Pharmaceuticals and Graphcore Limited are not liable for any damages arising from its use. Refer to the LICENSE file for the full terms and conditions.

`float_or_none(string)` ¶

check if a string can be converted to float, return none if it can't Parameters: string: str Returns: val: float or None

graphium.features¶

Featurizer¶

graphium.features.featurizer ¶

GraphDict ¶

__init__(dic) ¶

make_pyg_graph(**kwargs) ¶

get_estimated_bond_length(bond, mol) ¶

get_mol_atomic_features_float(mol, property_list, offset_carbon=True, mask_nan='raise') ¶

get_mol_atomic_features_onehot(mol, property_list) ¶

get_mol_conformer_features(mol, property_list, mask_nan=None) ¶

get_mol_edge_features(mol, property_list, mask_nan='raise') ¶

get_simple_mol_conformer(mol) ¶

mol_to_adj_and_features(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, mask_nan='raise') ¶

mol_to_adjacency_matrix(mol, use_bonds_weights=False, add_self_loop=False, dtype=np.float32) ¶

mol_to_graph_signature(featurizer_args=None) ¶

to_dense_array(array, dtype=None) ¶

to_dense_tensor(tensor, dtype=None) ¶

Positional Encoding¶

graphium.features.positional_encoding ¶

get_all_positional_encodings(adj, num_nodes, pos_kwargs=None) ¶

graph_positional_encoder(adj, num_nodes, pos_type=None, pos_level=None, pos_kwargs=None, cache=None) ¶

Properties¶

graphium.features.properties ¶

get_prop_or_none(prop, n, *args, **kwargs) ¶

get_props_from_mol(mol, properties='autocorr3d') ¶

Spectral PE¶

graphium.features.spectral ¶

compute_laplacian_pe(adj, num_pos, cache, disconnected_comp=True, normalization='none') ¶

normalize_matrix(matrix, degree_vector=None, normalization=None) ¶

Parameters¶

Returns¶

Random Walk PE¶

graphium.features.rw ¶

compute_rwse(adj, ksteps, num_nodes, cache, pos_type='rw_return_probs' or 'rw_transition_probs', space_dim=0) ¶

get_Pks(ksteps, edge_index, edge_weight=None, num_nodes=None, start_Pk=None, start_k=None) ¶

NMP¶

graphium.features.nmp ¶

float_or_none(string) ¶

`graphium.features.featurizer` ¶

`GraphDict` ¶

`init(dic)` ¶

`make_pyg_graph(**kwargs)` ¶

`get_estimated_bond_length(bond, mol)` ¶

`get_mol_atomic_features_float(mol, property_list, offset_carbon=True, mask_nan='raise')` ¶

`get_mol_atomic_features_onehot(mol, property_list)` ¶

`get_mol_conformer_features(mol, property_list, mask_nan=None)` ¶

`get_mol_edge_features(mol, property_list, mask_nan='raise')` ¶

`get_simple_mol_conformer(mol)` ¶

`mol_to_adj_and_features(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, mask_nan='raise')` ¶

`mol_to_adjacency_matrix(mol, use_bonds_weights=False, add_self_loop=False, dtype=np.float32)` ¶

`mol_to_graph_signature(featurizer_args=None)` ¶

`to_dense_array(array, dtype=None)` ¶

`to_dense_tensor(tensor, dtype=None)` ¶

`graphium.features.positional_encoding` ¶

`get_all_positional_encodings(adj, num_nodes, pos_kwargs=None)` ¶

`graph_positional_encoder(adj, num_nodes, pos_type=None, pos_level=None, pos_kwargs=None, cache=None)` ¶

`graphium.features.properties` ¶

`get_prop_or_none(prop, n, *args, **kwargs)` ¶

`get_props_from_mol(mol, properties='autocorr3d')` ¶

`graphium.features.spectral` ¶

`compute_laplacian_pe(adj, num_pos, cache, disconnected_comp=True, normalization='none')` ¶

`normalize_matrix(matrix, degree_vector=None, normalization=None)` ¶

`graphium.features.rw` ¶

`compute_rwse(adj, ksteps, num_nodes, cache, pos_type='rw_return_probs' or 'rw_transition_probs', space_dim=0)` ¶

`get_Pks(ksteps, edge_index, edge_weight=None, num_nodes=None, start_Pk=None, start_k=None)` ¶

`graphium.features.nmp` ¶

`float_or_none(string)` ¶