graphium.features¶

Feature extraction and manipulation

Contents

Featurizer
Positional Encoding
Properties
Spectral PE
Random Walk PE
NMP

Featurizer¶

`graphium.features.featurizer` ¶

`GraphDict` ¶

Bases: dict

`init(dic)` ¶

Store the parameters required to initialize a pyg.data.Data, but as a dictionary to reduce memory consumption.

Possible keys for the dictionary:

adj: A sparse Tensor containing the adjacency matrix
ndata: A dictionnary containing different keys and Tensors associated to the node features.
edata: A dictionnary containing different keys and Tensors associated to the edge features.
dtype: The dtype for the floating data.
mask_nan: Deal with molecules that fail a part of the featurization. NaNs can happen when taking the of a noble gas, or other properties that are not measured for specific atoms.
- "raise": Raise an error when there is a nan or inf in the featurization
- "warn": Raise a warning when there is a nan or inf in the featurization
- "None": DEFAULT. Don't do anything
- "Floating value": Replace nans or inf by the specified value

`make_pyg_graph(**kwargs)` ¶

Convert the current dictionary of parameters, containing an adjacency matrix with node/edge data into a pyg.data.Data of torch Tensors.

**kwargs can be used to overwrite any parameter from the current dictionary. See GraphDict.__init__ for a list of parameters

`get_estimated_bond_length(bond, mol)` ¶

Estimate the bond length between atoms by looking at the estimated atomic radius that depends both on the atom type and the bond type. The resulting bond-length is then the sum of the radius.

Keep in mind that this function only provides an estimate of the bond length and not the true one based on a conformer. The vast majority od estimated bond lengths will have an error below 5% while some bonds can have an error up to 20%. This function is mostly useful when conformer generation fails for some molecules, or for increased computation speed.

Parameters:

Name	Type	Description	Default
`bond`	`Chem.rdchem.Bond`	The bond to measure its lenght	required
`mol`	`dm.Mol`	The molecule containing the bond (used to get neighbouring atoms)	required

Returns:

Name	Type	Description
`bond_length`	`float`	The bond length in Angstrom, typically a value around 1-2.

`get_mol_atomic_features_float(mol, property_list, offset_carbon=True, mask_nan='raise')` ¶

Get a dictionary of floating-point arrays of atomic properties. To ensure all properties are at a similar scale, some of the properties are divided by a constant.

There is also the possibility of offseting by the carbon value using the offset_carbon parameter.

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	molecule from which to extract the properties	required
`property_list`	`Union[List[str], List[Callable]]`	A list of atomic properties to get from the molecule, such as 'atomic-number', 'mass', 'valence', 'degree', 'electronegativity'. Some elements are divided by a factor to avoid feature explosion. Accepted properties are: "atomic-number" "mass", "weight" "valence", "total-valence" "implicit-valence" "hybridization" "chirality" "hybridization" "aromatic" "ring", "in-ring" "min-ring" "max-ring" "num-ring" "degree" "radical-electron" "formal-charge" "vdw-radius" "covalent-radius" "electronegativity" "ionization", "first-ionization" "melting-point" "metal" "single-bond" "aromatic-bond" "double-bond" "triple-bond" "is-carbon" "group" "period"	required
`offset_carbon`	`bool`	Whether to subract the Carbon property from the desired atomic property. For example, if we want the mass of the Lithium (6.941), the mass of the Carbon (12.0107) will be subracted, resulting in a value of -5.0697	`True`
`mask_nan`	`Union[str, float, type(None)]`	Deal with molecules that fail a part of the featurization. NaNs can happen when taking the of a noble gas, or other properties that are not measured for specific atoms. "raise": Raise an error when there is a nan or inf in the featurization "warn": Raise a warning when there is a nan or inf in the featurization "None": DEFAULT. Don't do anything "Floating value": Replace nans or inf by the specified value	`'raise'`

Returns:

Name	Type	Description
`prop_dict`	`Dict[str, np.ndarray]`	A dictionnary where the element of `property_list` are the keys and the values are np.ndarray of shape (N,). N is the number of atoms in `mol`.

`get_mol_atomic_features_onehot(mol, property_list)` ¶

Get the following set of features for any given atom

One-hot representation of the atom
One-hot representation of the atom degree
One-hot representation of the atom implicit valence
One-hot representation of the the atom hybridization
Whether the atom is aromatic
The atom's formal charge
The atom's number of radical electrons

Additionally, the following features can be set, depending on the value of input Parameters

One-hot representation of the number of hydrogen atom in the the current atom neighborhood if explicit_H is false
One-hot encoding of the atom chirality, and whether such configuration is even possible

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	molecule from which to extract the properties	required
`property_list`	`List[str]`	A list of integer atomic properties to get from the molecule. The integer values are converted to a one-hot vector. Callables are not supported by this function. Accepted properties are: "atomic-number" "degree" "valence", "total-valence" "implicit-valence" "hybridization" "chirality" "phase" "type" "group" "period"	required

Returns:

Name	Type	Description
`prop_dict`	`Dict[str, Tensor]`	A dictionnary where the element of `property_list` are the keys and the values are np.ndarray of shape (N, OH). N is the number of atoms in `mol` and OH the lenght of the one-hot encoding.

`get_mol_conformer_features(mol, property_list, mask_nan=None)` ¶

obtain the conformer features of a molecule

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	molecule from which to extract the properties	required
`property_list`	`Union[List[str], List[Callable]]`	A list of conformer property to get from the molecule Accepted properties are: - "positions_3d"	required

Returns:

Name	Type	Description
`prop_dict`	`Dict[str, np.ndarray]`	a dictionary where the element of `property_list` are the keys

`get_mol_edge_features(mol, property_list, mask_nan='raise')` ¶

Get the following set of features for any given bond See graphium.features.nmp for allowed values in one hot encoding

One-hot representation of the bond type. Note that you should not kekulize your molecules, if you expect this to take aromatic bond into account.
Bond stereo type, following CIP classification
Whether the bond is conjugated
Whether the bond is in a ring

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	rdkit.Chem.Molecule the molecule of interest	required
`property_list`	`List[str]`	A list of edge properties to return for the given molecule. Accepted properties are: "bond-type-onehot" "bond-type-float" "stereo" "in-ring" "conjugated" "conformer-bond-length" (might cause problems with complex molecules) "estimated-bond-length"	required

Returns:

Name	Type	Description
`prop_dict`	`Dict[str, np.ndarray]`	A dictionnary where the element of `property_list` are the keys and the values are np.ndarray of shape (N,). N is the number of atoms in `mol`.

`get_simple_mol_conformer(mol)` ¶

If the molecule has a conformer, then it will return the conformer at idx 0. Otherwise, it generates a simple molecule conformer using rdkit.Chem.rdDistGeom.EmbedMolecule and returns it. This is meant to be used in simple functions like GetBondLength, not in functions requiring complex 3D structure.

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	Rdkit Molecule	required

Returns:

Name	Type	Description
`conf`	`Union[Chem.rdchem.Conformer, None]`	A conformer of the molecule, or `None` if it fails

`mol_to_adj_and_features(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, mask_nan='raise')` ¶

Transforms a molecule into an adjacency matrix representing the molecular graph and a set of atom and bond features.

It also returns the positional encodings associated to the graph.

Parameters:

Name	Type	Description	Default
`mol`	`Union[str, dm.Mol]`	The molecule to be converted	required
`atom_property_list_onehot`	`List[str]`	List of the properties used to get one-hot encoding of the atom type, such as the atom index represented as a one-hot vector. See function `get_mol_atomic_features_onehot`	`[]`
`atom_property_list_float`	`List[Union[str, Callable]]`	List of the properties used to get floating-point encoding of the atom type, such as the atomic mass or electronegativity. See function `get_mol_atomic_features_float`	`[]`
`conformer_property_list`	`List[str]`	list of properties used to encode the conformer information, outside of atom properties, currently support "positions_3d"	`[]`
`edge_property_list`	`List[str]`	List of the properties used to encode the edges, such as the edge type and the stereo type.	`[]`
`add_self_loop`	`bool`	Whether to add a value of `1` on the diagonal of the adjacency matrix.	`False`
`explicit_H`	`bool`	Whether to consider the Hydrogens explicitely. If `False`, the hydrogens are implicit.	`False`
`use_bonds_weights`	`bool`	Whether to use the floating-point value of the bonds in the adjacency matrix, such that single bonds are represented by 1, double bonds 2, triple 3, aromatic 1.5	`False`
`pos_encoding_as_features`	`Dict[str, Any]`	keyword arguments for function `graph_positional_encoder` to generate positional encoding for node features.	`None`
`dtype`	`np.dtype`	The torch data type used to build the graph	`np.float16`
`mask_nan`	`Union[str, float, type(None)]`	Deal with molecules that fail a part of the featurization. NaNs can happen when taking the of a noble gas, or other properties that are not measured for specific atoms. "raise": Raise an error when there is a nan or inf in the featurization "warn": Raise a warning when there is a nan or inf in the featurization "None": DEFAULT. Don't do anything "Floating value": Replace nans or inf by the specified value	`'raise'`

Returns:

Name	Type	Description
`adj`	`Union[coo_matrix, Union[Tensor, None], Union[Tensor, None], Dict[str, Tensor], Union[Tensor, None], Dict[str, Tensor]]`	torch coo sparse adjacency matrix of the molecule
`ndata`	`Union[coo_matrix, Union[Tensor, None], Union[Tensor, None], Dict[str, Tensor], Union[Tensor, None], Dict[str, Tensor]]`	Concatenated node data of the atoms, based on the properties from `atom_property_list_onehot` and `atom_property_list_float`. If no properties are given, it returns `None`
`edata`	`Union[coo_matrix, Union[Tensor, None], Union[Tensor, None], Dict[str, Tensor], Union[Tensor, None], Dict[str, Tensor]]`	Concatenated node edge of the molecule, based on the properties from `edge_property_list`. If no properties are given, it returns `None`
`pe_dict`	`Union[coo_matrix, Union[Tensor, None], Union[Tensor, None], Dict[str, Tensor], Union[Tensor, None], Dict[str, Tensor]]`	Dictionary of all positional encodings. Current supported keys: "pos_enc_feats_sign_flip": Node positional encoding that requires augmentation via sign-flip. For example, eigenvectors of the Laplacian are ambiguous to the sign and are returned here. "pos_enc_feats_no_flip": Node positional encoding that requires does not use sign-flip. For example, distance from centroid are returned here. "rwse": Node structural encoding corresponding to the diagonal of the random walk matrix
`conf_dict`	`Union[coo_matrix, Union[Tensor, None], Union[Tensor, None], Dict[str, Tensor], Union[Tensor, None], Dict[str, Tensor]]`	contains the 3d positions of a conformer of the molecule or 0s if none is found

`mol_to_adjacency_matrix(mol, use_bonds_weights=False, add_self_loop=False, dtype=np.float32)` ¶

Convert a molecule to a sparse adjacency matrix, as a torch Tensor. Instead of using the Rdkit GetAdjacencyMatrix() method, this method uses the bond ordering from the molecule object, which is the same as the bond ordering in the bond features.

Warning

Do not use Tensor.coalesce() on the returned adjacency matrix, as it will change the ordering of the bonds.

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	A molecule in the form of a SMILES string or an RDKit molecule object.	required
`use_bonds_weights`	`bool`	If `True`, the adjacency matrix will contain the bond type as the value of the edge. If `False`, the adjacency matrix will contain `1` as the value of the edge.	`False`
`add_self_loop`	`bool`	If `True`, the adjacency matrix will contain a self-loop for each node.	`False`
`dtype`	`np.dtype`	The data type used to build the graph	`np.float32`

Returns:

Name	Type	Description
`adj`	`coo_matrix`	coo sparse adjacency matrix of the molecule

`mol_to_graph_dict(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, on_error='ignore', mask_nan='raise', max_num_atoms=None)` ¶

Transforms a molecule into an adjacency matrix representing the molecular graph and a set of atom and bond features, and re-organizes them into a dictionary that allows to build a pyg.data.Data object.

Compared to mol_to_pyggraph, this function does not build the graph directly, and is thus faster, less memory heavy, and compatible with other frameworks.

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	The molecule to be converted	required
`atom_property_list_onehot`	`List[str]`	List of the properties used to get one-hot encoding of the atom type, such as the atom index represented as a one-hot vector. See function `get_mol_atomic_features_onehot`	`[]`
`atom_property_list_float`	`List[Union[str, Callable]]`	List of the properties used to get floating-point encoding of the atom type, such as the atomic mass or electronegativity. See function `get_mol_atomic_features_float`	`[]`
`conformer_property_list`	`List[str]`	list of properties used to encode the conformer information, outside of atom properties, currently support "positions_3d"	`[]`
`edge_property_list`	`List[str]`	List of the properties used to encode the edges, such as the edge type and the stereo type.	`[]`
`add_self_loop`	`bool`	Whether to add a value of `1` on the diagonal of the adjacency matrix.	`False`
`explicit_H`	`bool`	Whether to consider the Hydrogens explicitely. If `False`, the hydrogens are implicit.	`False`
`use_bonds_weights`	`bool`	Whether to use the floating-point value of the bonds in the adjacency matrix, such that single bonds are represented by 1, double bonds 2, triple 3, aromatic 1.5	`False`
`pos_encoding_as_features`	`Dict[str, Any]`	keyword arguments for function `graph_positional_encoder` to generate positional encoding for node features.	`None`
`dtype`	`np.dtype`	The numpy data type used to build the graph	`np.float16`
`on_error`	`str`	What to do when the featurization fails. This can change the behavior of `mask_nan`. "raise": Raise an error "warn": Raise a warning and return a string of the error "ignore": Ignore the error and return a string of the error	`'ignore'`
`mask_nan`	`Union[str, float, type(None)]`	Deal with molecules that fail a part of the featurization. NaNs can happen when taking the of a noble gas, or other properties that are not measured for specific atoms. "raise": Raise an error when there is a nan or inf in the featurization "warn": Raise a warning when there is a nan or inf in the featurization "None": DEFAULT. Don't do anything "Floating value": Replace nans or inf by the specified value	`'raise'`
`max_num_atoms`	`Optional[int]`	Maximum number of atoms for a given molecule. If a molecule with more atoms is give, an error is raised, but catpured according to the rules of `on_error`.	`None`

Returns:

Name	Type	Description
`graph_dict`	`Union[GraphDict, str]`	A dictionary `GraphDict` containing the keys required to build a graph, and which can be used to build a PyG graph. If it fails to featurize the molecule, it returns a string with the error. "adj": A sparse int-array containing the adjacency matrix "data": A dictionnary containing different keys and numpy arrays associated to the (node, edge & graph) features. "dtype": The numpy dtype for the floating data.

`mol_to_graph_signature(featurizer_args=None)` ¶

Get the default arguments of mol_to_graph_dict and update it with a provided dict of arguments in order to get a fulle signature of the featurizer args actually used for the features computation.

Parameters:

Name	Type	Description	Default
`featurizer_args`	`Dict[str, Any]`	A dictionary of featurizer arguments to update	`None`

Returns:

Type	Description
`Dict[str, Any]`	A dictionary of featurizer arguments

`mol_to_pyggraph(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, on_error='ignore', mask_nan='raise', max_num_atoms=None)` ¶

Transforms a molecule into an adjacency matrix representing the molecular graph and a set of atom and bond features.

Then, the adjacency matrix and node/edge features are used to build a pyg.data.Data with pytorch Tensors.

Parameters:

Name	Type	Description	Default
`mol`	`dm.Mol`	The molecule to be converted	required
`atom_property_list_onehot`	`List[str]`	List of the properties used to get one-hot encoding of the atom type, such as the atom index represented as a one-hot vector. See function `get_mol_atomic_features_onehot`	`[]`
`atom_property_list_float`	`List[Union[str, Callable]]`	List of the properties used to get floating-point encoding of the atom type, such as the atomic mass or electronegativity. See function `get_mol_atomic_features_float`	`[]`
`conformer_property_list`	`List[str]`	list of properties used to encode the conformer information, outside of atom properties, currently support "positions_3d"	`[]`
`edge_property_list`	`List[str]`	List of the properties used to encode the edges, such as the edge type and the stereo type.	`[]`
`add_self_loop`	`bool`	Whether to add a value of `1` on the diagonal of the adjacency matrix.	`False`
`explicit_H`	`bool`	Whether to consider the Hydrogens explicitely. If `False`, the hydrogens are implicit.	`False`
`use_bonds_weights`	`bool`	Whether to use the floating-point value of the bonds in the adjacency matrix, such that single bonds are represented by 1, double bonds 2, triple 3, aromatic 1.5	`False`
`pos_encoding_as_features`	`Dict[str, Any]`	keyword arguments for function `graph_positional_encoder` to generate positional encoding for node features.	`None`
`dtype`	`np.dtype`	The numpy data type used to build the graph	`np.float16`
`on_error`	`str`	What to do when the featurization fails. This can change the behavior of `mask_nan`. "raise": Raise an error "warn": Raise a warning and return a string of the error "ignore": Ignore the error and return a string of the error	`'ignore'`
`mask_nan`	`Union[str, float, type(None)]`	Deal with molecules that fail a part of the featurization. NaNs can happen when taking the of a noble gas, or other properties that are not measured for specific atoms. "raise": Raise an error when there is a nan in the featurization "warn": Raise a warning when there is a nan in the featurization "None": DEFAULT. Don't do anything "Floating value": Replace nans by the specified value	`'raise'`
`max_num_atoms`	`Optional[int]`	Maximum number of atoms for a given molecule. If a molecule with more atoms is give, an error is raised, but catpured according to the rules of `on_error`.	`None`

Returns:

Name	Type	Description
`graph`	`Union[Data, str]`	Pyg graph, with `graph['feat']` corresponding to the concatenated node data from `atom_property_list_onehot` and `atom_property_list_float`, `graph['edge_feat']` corresponding to the concatenated edge data from `edge_property_list`. There are also additional entries for the positional encodings.

`to_dense_array(array, dtype=None)` ¶

Assign the node data

Parameters:

Name	Type	Description	Default
`array`	`np.ndarray`	The array to convert to dense	required
`dtype`	`str`	The dtype of the array	`None`

Returns:

Type	Description
`np.ndarray`	The dense array

`to_dense_tensor(tensor, dtype=None)` ¶

Assign the node data

Parameters:

Name	Type	Description	Default
`array`		The array to convert to dense	required
`dtype`	`str`	The dtype of the array	`None`

Returns:

Type	Description
`Tensor`	The dense array

Positional Encoding¶

`graphium.features.positional_encoding` ¶

`get_all_positional_encodings(adj, num_nodes, pos_kwargs=None)` ¶

Get features positional encoding.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix of the graph	required
`num_nodes`	`int`	Number of nodes in the graph	required
`pos_encoding_as_features`		keyword arguments for function `graph_positional_encoder` to generate positional encoding for node features.	required

Returns:

Name	Type	Description
`pe_dict`	`Tuple[OrderedDict[str, np.ndarray]]`	Dictionary of positional and structural encodings

`graph_positional_encoder(adj, num_nodes, pos_type=None, pos_level=None, pos_kwargs=None, cache=None)` ¶

Get a positional encoding that depends on the parameters.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix of the graph	required
`num_nodes`	`int`	Number of nodes in the graph	required
`pos_type`	`Optional[str]`	The type of positional encoding to use. If None, it must be provided by `pos_kwargs["pos_type"]`. Supported types are: - laplacian_eigvec \ - laplacian_eigval \ -> cache connected comps. & eigendecomp. - rwse - electrostatic \ - commute \ -> cache pinvL - graphormer	`None`
`pos_level`	`Optional[str]`	Positional level to output. If None, it must be provided by `pos_kwargs["pos_level"]`. - node - edge - nodepair - graph	`None`
`pos_kwargs`	`Optional[Dict[str, Any]]`	Extra keyword arguments for the positional encoding. Can include the keys pos_type and pos_level.	`None`
`cache`	`Optional[Dict[str, Any]]`	Dictionary of cached objects	`None`

Returns:

Name	Type	Description
`pe`	`Dict[str, np.ndarray]`	Positional or structural encoding
`cache`	`Dict[str, Any]`	Updated dictionary of cached objects

Properties¶

`graphium.features.properties` ¶

`get_prop_or_none(prop, n, *args, **kwargs)` ¶

return properties. If error, return list of None with lenght n.

Parameters:

Name	Type	Description	Default
`prop`	`Callable`	The property to compute.	required
`n`	`int`	The number of elements in the property.	required
`*args`	`Union[dm.Mol, str]`	The arguments to pass to the property.	`()`
`**kwargs`	`Union[dm.Mol, str]`	The keyword arguments to pass to the property.	`{}`

Returns:

Type	Description
`Union[List[float], List[None]]`	The property or a list of `None` with lenght `n`.

`get_props_from_mol(mol, properties='autocorr3d')` ¶

Function to get a given set of desired properties from a molecule, and output a property list.

Parameters:

Name	Type	Description	Default
`mol`	`Union[dm.Mol, str]`	The molecule from which to compute the properties.	required
`properties`	`Union[List[str], str]`	The list of properties to compute for each molecule. It can be the following: 'descriptors' 'autocorr3d' 'rdf' 'morse' 'whim' 'all'	`'autocorr3d'`

Returns:

Name	Type	Description
`props`	`np.ndarray`	np.array(float) The array of properties for the desired molecule
`classes_start_idx`	`np.ndarray`	list(int) The list of index specifying the start of each new class of descriptor or property. For example, if props has 20 elements, the first 5 are rotatable bonds, the next 8 are morse, and the rest are whim, then `classes_start_idx = [0, 5, 13]`. This will mainly be useful to normalize the features of each class.
`classes_names`	`np.ndarray`	list(str) The name of the classes associated to each starting index. Will be usefull to understand what property is the network learning.

Spectral PE¶

`graphium.features.spectral` ¶

`compute_laplacian_pe(adj, num_pos, cache, disconnected_comp=True, normalization='none')` ¶

Compute the Laplacian eigenvalues and eigenvectors of the Laplacian of the graph.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix of the graph	required
`num_pos`	`int`	Number of Laplacian eigenvectors to compute	required
`cache`	`Dict[str, Any]`	Dictionary of cached objects	required
`disconnected_comp`	`bool`	Whether to compute the eigenvectors for each connected component	`True`
`normalization`	`str`	Normalization to apply to the Laplacian	`'none'`

Returns:

Name	Type	Description
	`np.ndarray`	Two possible outputs: eigvals [num_nodes, num_pos]: Eigenvalues of the Laplacian repeated for each node. This repetition is necessary in case of disconnected components, where the eigenvalues of the Laplacian are not the same for each node. eigvecs [num_nodes, num_pos]: Eigenvectors of the Laplacian
`base_level`	`str`	Indicator of the output pos_level (node, edge, nodepair, graph) -> here node
`cache`	`Dict[str, Any]`	Updated dictionary of cached objects

`normalize_matrix(matrix, degree_vector=None, normalization=None)` ¶

Normalize a given matrix using its degree vector

Parameters¶

matrix: torch.tensor(N, N) or scipy.sparse.spmatrix(N, N)
    A square matrix representing either an Adjacency matrix or a Laplacian.

degree_vector: torch.tensor(N) or np.ndarray(N) or None
    A vector representing the degree of ``matrix``.
    ``None`` is only accepted if ``normalization==None``

normalization: str or None, Default='none'
    Normalization to use on the eig_matrix

    - 'none' or ``None``: no normalization

    - 'sym': Symmetric normalization ``D^-0.5 L D^-0.5``

    - 'inv': Inverse normalization ``D^-1 L``

Returns¶

matrix: torch.tensor(N, N) or scipy.sparse.spmatrix(N, N)
    The normalized matrix

Random Walk PE¶

`graphium.features.rw` ¶

`compute_rwse(adj, ksteps, num_nodes, cache, pos_type='rw_return_probs' or 'rw_transition_probs', space_dim=0)` ¶

Compute Random Walk Spectral Embedding (RWSE) for given list of K steps.

Parameters:

Name	Type	Description	Default
`adj`	`[num_nodes, num_nodes]`	Adjacency matrix	required
`ksteps`	`Union[int, List[int]]`	List of numbers of steps for the random walks. If int, a list is generated from 1 to ksteps.	required
`num_nodes`	`int`	Number of nodes in the graph	required
`cache`	`Dict[str, Any]`	Dictionary of cached objects	required
`pos_type`	`str`	Desired output	`'rw_return_probs' or 'rw_transition_probs'`
`space_dim`	`int`	Estimated dimensionality of the space. Used to correct the random-walk diagonal by a factor `k^(space_dim/2)`. In euclidean space, this correction means that the height of the gaussian distribution stays almost constant across the number of steps, if `space_dim` is the dimension of the euclidean space.	`0`

Returns:

Name	Type	Description
	`np.ndarray`	Two possible outputs: rw_return_probs [num_nodes, len(ksteps)]: Random-Walk k-step landing probabilities rw_transition_probs [num_nodes, num_nodes, len(ksteps)]: Random-Walk k-step transition probabilities
`base_level`	`str`	Indicator of the output pos_level (node, edge, nodepair, graph) -> here either node or nodepair
`cache`	`Dict[str, Any]`	Updated dictionary of cached objects

`get_Pks(ksteps, edge_index, edge_weight=None, num_nodes=None, start_Pk=None, start_k=None)` ¶

Compute Random Walk landing probabilities for given list of K steps.

Parameters:

Name	Type	Description	Default
`ksteps`	`List[int]`	List of numbers of k-steps for which to compute the RW landings	required
`edge_index`	`Tuple[torch.Tensor, torch.Tensor]`	PyG sparse representation of the graph	required
`edge_weight`	`Optional[torch.Tensor]`	Edge weights	`None`
`num_nodes`	`Optional[int]`	Number of nodes in the graph	`None`

Returns:

Type	Description
`Dict[int, np.ndarray]`	2D Tensor with shape (num_nodes, len(ksteps)) with RW landing probs

NMP¶

`graphium.features.nmp` ¶

`float_or_none(string)` ¶

check if a string can be converted to float, return none if it can't

Parameters:

Name	Type	Description	Default
`string`	`str`	str	required

Returns:

Name	Type	Description
`val`	`Union[float, None]`	float or None

graphium.features¶

Featurizer¶

graphium.features.featurizer ¶

GraphDict ¶

__init__(dic) ¶

make_pyg_graph(**kwargs) ¶

get_estimated_bond_length(bond, mol) ¶

get_mol_atomic_features_float(mol, property_list, offset_carbon=True, mask_nan='raise') ¶

get_mol_atomic_features_onehot(mol, property_list) ¶

get_mol_conformer_features(mol, property_list, mask_nan=None) ¶

get_mol_edge_features(mol, property_list, mask_nan='raise') ¶

get_simple_mol_conformer(mol) ¶

mol_to_adj_and_features(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, mask_nan='raise') ¶

mol_to_adjacency_matrix(mol, use_bonds_weights=False, add_self_loop=False, dtype=np.float32) ¶

mol_to_graph_signature(featurizer_args=None) ¶

to_dense_array(array, dtype=None) ¶

to_dense_tensor(tensor, dtype=None) ¶

Positional Encoding¶

graphium.features.positional_encoding ¶

get_all_positional_encodings(adj, num_nodes, pos_kwargs=None) ¶

graph_positional_encoder(adj, num_nodes, pos_type=None, pos_level=None, pos_kwargs=None, cache=None) ¶

Properties¶

graphium.features.properties ¶

get_prop_or_none(prop, n, *args, **kwargs) ¶

get_props_from_mol(mol, properties='autocorr3d') ¶

Spectral PE¶

graphium.features.spectral ¶

compute_laplacian_pe(adj, num_pos, cache, disconnected_comp=True, normalization='none') ¶

normalize_matrix(matrix, degree_vector=None, normalization=None) ¶

Parameters¶

Returns¶

Random Walk PE¶

graphium.features.rw ¶

compute_rwse(adj, ksteps, num_nodes, cache, pos_type='rw_return_probs' or 'rw_transition_probs', space_dim=0) ¶

get_Pks(ksteps, edge_index, edge_weight=None, num_nodes=None, start_Pk=None, start_k=None) ¶

NMP¶

graphium.features.nmp ¶

float_or_none(string) ¶

`graphium.features.featurizer` ¶

`GraphDict` ¶

`init(dic)` ¶

`make_pyg_graph(**kwargs)` ¶

`get_estimated_bond_length(bond, mol)` ¶

`get_mol_atomic_features_float(mol, property_list, offset_carbon=True, mask_nan='raise')` ¶

`get_mol_atomic_features_onehot(mol, property_list)` ¶

`get_mol_conformer_features(mol, property_list, mask_nan=None)` ¶

`get_mol_edge_features(mol, property_list, mask_nan='raise')` ¶

`get_simple_mol_conformer(mol)` ¶

`mol_to_adj_and_features(mol, atom_property_list_onehot=[], atom_property_list_float=[], conformer_property_list=[], edge_property_list=[], add_self_loop=False, explicit_H=False, use_bonds_weights=False, pos_encoding_as_features=None, dtype=np.float16, mask_nan='raise')` ¶

`mol_to_adjacency_matrix(mol, use_bonds_weights=False, add_self_loop=False, dtype=np.float32)` ¶

`mol_to_graph_signature(featurizer_args=None)` ¶

`to_dense_array(array, dtype=None)` ¶

`to_dense_tensor(tensor, dtype=None)` ¶

`graphium.features.positional_encoding` ¶

`get_all_positional_encodings(adj, num_nodes, pos_kwargs=None)` ¶

`graph_positional_encoder(adj, num_nodes, pos_type=None, pos_level=None, pos_kwargs=None, cache=None)` ¶

`graphium.features.properties` ¶

`get_prop_or_none(prop, n, *args, **kwargs)` ¶

`get_props_from_mol(mol, properties='autocorr3d')` ¶

`graphium.features.spectral` ¶

`compute_laplacian_pe(adj, num_pos, cache, disconnected_comp=True, normalization='none')` ¶

`normalize_matrix(matrix, degree_vector=None, normalization=None)` ¶

`graphium.features.rw` ¶

`compute_rwse(adj, ksteps, num_nodes, cache, pos_type='rw_return_probs' or 'rw_transition_probs', space_dim=0)` ¶

`get_Pks(ksteps, edge_index, edge_weight=None, num_nodes=None, start_Pk=None, start_k=None)` ¶

`graphium.features.nmp` ¶

`float_or_none(string)` ¶