partycls package

partycls is a Python package for cluster analysis of systems of interacting particles. By grouping particles that share similar structural or dynamical properties, partycls enables rapid and unsupervised exploration of the system’s relevant features. It provides descriptors suitable for applications in condensed matter physics, such as structural analysis of disordered or partially ordered materials, and integrates the necessary tools of unsupervised learning into a streamlined workflow.

Subpackages

Submodules

partycls.cell module

Simulation cell.

This class is inspired by the framework atooms authored by Daniele Coslovich.

class partycls.cell.Cell(side, periodic=None)[source]

Bases: object

Orthorhombic cell.

side

List of lengths for the sides of the cell.

Type

numpy.ndarray

periodic

Periodicity of the cell on each axis.

Type

numpy.ndarray

Parameters
  • side (list) – List of lengths for the sides of the cell.

  • periodic (list, default: None) – Periodicity of the cell on each axis. Default is None (sets True) in each direction.

Example

>>> c = Cell([2.0, 2.0, 2.0], periodic=[True, True, True ])
property volume

Volume of the cell.

partycls.clustering module

Clustering algorithms.

class partycls.clustering.Clustering(n_clusters=2, n_init=1, backend=None)[source]

Bases: object

Base class for clustering methods.

n_clusters

Number of clusters.

Type

int

n_init

Number of times the clustering is run.

Type

int

labels

Cluster labels. The default is None. Initialized after the fit method is called.

Type

list

If a scikit-learn compatible backend is available, it will be used within Strategy.

Parameters
  • n_clusters (int, default: 2) – Requested number of clusters.

  • n_init (int, default: 1) – Number of times the clustering will be run with different seeds.

  • backend (scikit-learn compatible backend, default: None) – Backend used for the clustering method. If provided, it must be an object implementing an scikit-learn compatible interface, with a fit method and a labels_ attribute. Duck typing is assumed.

fit(X)[source]

Run a scikit-learn compatible clustering backend (if available) on X.

Subclasses implementing a specific clustering algorithm must override this method.

Parameters

X (numpy.ndarray) – Dataset matrix for which to compute the clusters.

Return type

None

property fractions

numpy.ndarray with the fractions of particles in each cluster.

property populations

numpy.ndarray with the number of particles in each cluster.

centroids(X)[source]

Central feature vector of each cluster.

Each object in the dataset over which the clustering was performed is assigned a discrete label. This label represents the index of the nearest cluster center to which this object belongs. The centroid (i.e. the cluster center), is thus the average feature vector of all the objects in the cluster.

Cluster memberships of the objects are stored in the labels attribute. Coordinates of the centroids can then be calculated for an arbitrary dataset X, provided it has the same shape as the original dataset used for the clustering.

Parameters

X (numpy.ndarray) – Array of features (dataset) for which to compute the centroids.

Returns

C_k – Cluster centroids. C_k[n] is the coordinates of the n-th cluster center.

Return type

numpy.ndarray

class partycls.clustering.KMeans(n_clusters=2, n_init=1)[source]

Bases: Clustering

KMeans clustering.

This class relies on the class KMeans from the machine learning package scikit-learn. An instance of sklearn.cluster.KMeans is created when calling the fit method, and is then accessible through the backend attribute for later use. See scikit-learn’s documentation for more information on the original class.

Parameters
  • n_clusters (int, default: 2) – Requested number of clusters.

  • n_init (int, default: 1) – Number of times the clustering will be run with different seeds.

fit(X)[source]

Run the K-Means algorithm on X. The predicted labels are updated in the attribute labels of the current instance of KMeans.

Parameters

X (numpy.ndarray) – Dataset matrix for which to compute the clusters.

Return type

None

class partycls.clustering.GaussianMixture(n_clusters=2, n_init=1)[source]

Bases: Clustering

Gaussian Mixture.

This class relies on the class GaussianMixture from the machine learning package scikit-learn. An instance of sklearn.mixture.GaussianMixture is created when calling the fit method, and is then accessible through the backend attribute for later use. See scikit-learn’s documentation for more information on the original class.

Parameters
  • n_clusters (int, default: 2) – Requested number of clusters.

  • n_init (int, default: 1) – Number of times the clustering will be run with different seeds.

fit(X)[source]

Run the expectation-maximization algorithm on X using a mixture of Gaussians. The predicted labels are updated in the attribute labels of the current instance of GaussianMixture.

Parameters

X (numpy.ndarray) – Dataset matrix for which to compute the clusters.

Return type

None

class partycls.clustering.CommunityInference(n_clusters=2, n_init=1)[source]

Bases: Clustering

Community Inference is a hard clustering method based on information theory. See https://doi.org/10.1063/5.0004732 (Paret et. al) for more details.

Parameters
  • n_clusters (int, default: 2) – Requested number of clusters.

  • n_init (int, default: 1) – Number of times the clustering will be run with different seeds.

fit(X)[source]

Run the community inference algorithm on X, where X is an instance of StructuralDescriptor with a normalize method. Otherwise X is converted to a dummy descriptor.

Parameters

X (StructuralDescriptor) – Descriptor on which the community algorithm inference will be run.

Return type

None

partycls.dim_reduction module

Dimensionality reduction techniques (linear and non-linear), to be performed on a dataset stored in a numpy array.

class partycls.dim_reduction.PCA(n_components=None, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)[source]

Bases: PCA

symbol = 'pca'
full_name = 'Principal Component Analysis (PCA)'
reduce(X)[source]

Project the input features onto a reduced space using principal component analysis.

Parameters

X (numpy.ndarray) – Features in the original space.

Returns

Features in the reduced space.

Return type

numpy.ndarray

class partycls.dim_reduction.TSNE(n_components=2, *, perplexity=30.0, early_exaggeration=12.0, learning_rate='warn', n_iter=1000, n_iter_without_progress=300, min_grad_norm=1e-07, metric='euclidean', metric_params=None, init='warn', verbose=0, random_state=None, method='barnes_hut', angle=0.5, n_jobs=None, square_distances='deprecated')[source]

Bases: TSNE

symbol = 'tsne'
full_name = 't-distributed Stochastic Neighbor Embedding (t-SNE)'
reduce(X)[source]

Project the input features onto a reduced space using t-distributed stochastic neighbor embedding.

Parameters

X (numpy.ndarray) – Features in the original space.

Returns

Features in the reduced space.

Return type

numpy.ndarray

class partycls.dim_reduction.LocallyLinearEmbedding(*, n_neighbors=5, n_components=2, reg=0.001, eigen_solver='auto', tol=1e-06, max_iter=100, method='standard', hessian_tol=0.0001, modified_tol=1e-12, neighbors_algorithm='auto', random_state=None, n_jobs=None)[source]

Bases: LocallyLinearEmbedding

symbol = 'lle'
full_name = 'Locally Linear Embedding (LLE)'
reduce(X)[source]

Project the input features onto a reduced space using locally linear embedding.

Parameters

X (numpy.ndarray) – Features in the original space.

Returns

Features in the reduced space.

Return type

numpy.ndarray

class partycls.dim_reduction.AutoEncoder(layers=(100, 2, 100), activation='relu', solver='adam', alpha=0.0001)[source]

Bases: MLPRegressor

symbol = 'ae'
full_name = 'Neural-Network Auto-Encoder (AE)'
property n_components

Number of nodes at the level of the bottleneck layer (i.e. dimension after reduction).

reduce(X)[source]

Project the input features onto a reduced space using a neural network autoencoder. The dimension of the reduced space is the number of nodes in the bottleneck layer.

Parameters

X (numpy.ndarray) – Features in the original space.

Returns

Features in the reduced space.

Return type

numpy.ndarray

partycls.feature_scaling module

Feature scaling techniques, to be performed on a dataset stored in a numpy array.

class partycls.feature_scaling.ZScore(*, copy=True, with_mean=True, with_std=True)[source]

Bases: StandardScaler

symbol = 'zscore'
full_name = 'Z-Score'
scale(X)[source]

Standardize features by removing the mean and scaling to unit variance.

Parameters

X (numpy.ndarray) – Original features.

Returns

Scaled features.

Return type

numpy.ndarray

class partycls.feature_scaling.MinMax(feature_range=(0, 1), *, copy=True, clip=False)[source]

Bases: MinMaxScaler

symbol = 'minmax'
full_name = 'Min-Max'
scale(X)[source]

Transform features by scaling each feature to a given range (default is \([0,1]\)).

Parameters

X (numpy.ndarray) – Original features.

Returns

Scaled features.

Return type

numpy.ndarray

class partycls.feature_scaling.MaxAbs(*, copy=True)[source]

Bases: MaxAbsScaler

symbol = 'maxabs'
full_name = 'Max-Abs'
scale(X)[source]

Scale each feature by its maximum absolute value.

Parameters

X (numpy.ndarray) – Original features.

Returns

Scaled features.

Return type

numpy.ndarray

class partycls.feature_scaling.Robust(*, with_centering=True, with_scaling=True, quantile_range=(25.0, 75.0), copy=True, unit_variance=False)[source]

Bases: RobustScaler

symbol = 'robust'
full_name = 'Robust'
scale(X)[source]

Scale features using statistics that are robust to outliers.

Parameters

X (numpy.ndarray) – Original features.

Returns

Scaled features.

Return type

numpy.ndarray

partycls.helpers module

Various helper functions for visualization, cluster analysis, etc.

partycls.helpers.AMI(labels_true, labels_pred, *, average_method='arithmetic')

Adjusted Mutual Information between two clusterings.

Adjusted Mutual Information (AMI) is an adjustment of the Mutual Information (MI) score to account for chance. It accounts for the fact that the MI is generally higher for two clusterings with a larger number of clusters, regardless of whether there is actually more information shared. For two clusterings \(U\) and \(V\), the AMI is given as:

AMI(U, V) = [MI(U, V) - E(MI(U, V))] / [avg(H(U), H(V)) - E(MI(U, V))]

This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way.

This metric is furthermore symmetric: switching \(U\) (label_true) with \(V\) (labels_pred) will return the same score value. This can be useful to measure the agreement of two independent label assignments strategies on the same dataset when the real ground truth is not known.

Be mindful that this function is an order of magnitude slower than other metrics, such as the Adjusted Rand Index.

Read more in the User Guide.

Parameters
  • labels_true (int array, shape = [n_samples]) – A clustering of the data into disjoint subsets, called \(U\) in the above formula.

  • labels_pred (int array-like of shape (n_samples,)) – A clustering of the data into disjoint subsets, called \(V\) in the above formula.

  • average_method (str, default='arithmetic') –

    How to compute the normalizer in the denominator. Possible options are ‘min’, ‘geometric’, ‘arithmetic’, and ‘max’.

    New in version 0.20.

    Changed in version 0.22: The default value of average_method changed from ‘max’ to ‘arithmetic’.

Returns

ami – The AMI returns a value of 1 when the two partitions are identical (ie perfectly matched). Random partitions (independent labellings) have an expected AMI around 0 on average hence can be negative. The value is in adjusted nats (based on the natural logarithm).

Return type

float (upperlimited by 1.0)

See also

adjusted_rand_score

Adjusted Rand Index.

mutual_info_score

Mutual Information (not adjusted for chance).

Examples

Perfect labelings are both homogeneous and complete, hence have score 1.0:

>>> from sklearn.metrics.cluster import adjusted_mutual_info_score
>>> adjusted_mutual_info_score([0, 0, 1, 1], [0, 0, 1, 1])
... 
1.0
>>> adjusted_mutual_info_score([0, 0, 1, 1], [1, 1, 0, 0])
... 
1.0

If classes members are completely split across different clusters, the assignment is totally in-complete, hence the AMI is null:

>>> adjusted_mutual_info_score([0, 0, 0, 0], [0, 1, 2, 3])
... 
0.0

References

1

Vinh, Epps, and Bailey, (2010). Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance, JMLR

2

Wikipedia entry for the Adjusted Mutual Information

partycls.helpers.ARI(labels_true, labels_pred)

Rand index adjusted for chance.

The Rand Index computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings.

The raw RI score is then “adjusted for chance” into the ARI score using the following scheme:

ARI = (RI - Expected_RI) / (max(RI) - Expected_RI)

The adjusted Rand index is thus ensured to have a value close to 0.0 for random labeling independently of the number of clusters and samples and exactly 1.0 when the clusterings are identical (up to a permutation).

ARI is a symmetric measure:

adjusted_rand_score(a, b) == adjusted_rand_score(b, a)

Read more in the User Guide.

Parameters
  • labels_true (int array, shape = [n_samples]) – Ground truth class labels to be used as a reference

  • labels_pred (array-like of shape (n_samples,)) – Cluster labels to evaluate

Returns

ARI – Similarity score between -1.0 and 1.0. Random labelings have an ARI close to 0.0. 1.0 stands for perfect match.

Return type

float

Examples

Perfectly matching labelings have a score of 1 even

>>> from sklearn.metrics.cluster import adjusted_rand_score
>>> adjusted_rand_score([0, 0, 1, 1], [0, 0, 1, 1])
1.0
>>> adjusted_rand_score([0, 0, 1, 1], [1, 1, 0, 0])
1.0

Labelings that assign all classes members to the same clusters are complete but may not always be pure, hence penalized:

>>> adjusted_rand_score([0, 0, 1, 2], [0, 0, 1, 1])
0.57...

ARI is symmetric, so labelings that have pure clusters with members coming from the same classes but unnecessary splits are penalized:

>>> adjusted_rand_score([0, 0, 1, 1], [0, 0, 1, 2])
0.57...

If classes members are completely split across different clusters, the assignment is totally incomplete, hence the ARI is very low:

>>> adjusted_rand_score([0, 0, 0, 0], [0, 1, 2, 3])
0.0

References

Hubert1985

L. Hubert and P. Arabie, Comparing Partitions, Journal of Classification 1985 https://link.springer.com/article/10.1007%2FBF01908075

Steinley2004

D. Steinley, Properties of the Hubert-Arabie adjusted Rand index, Psychological Methods 2004

wk

https://en.wikipedia.org/wiki/Rand_index#Adjusted_Rand_index

See also

adjusted_mutual_info_score

Adjusted Mutual Information.

partycls.helpers.silhouette_samples(X, labels, *, metric='euclidean', **kwds)[source]

Compute the Silhouette Coefficient for each sample.

The Silhouette Coefficient is a measure of how well samples are clustered with samples that are similar to themselves. Clustering models with a high Silhouette Coefficient are said to be dense, where samples in the same cluster are similar to each other, and well separated, where samples in different clusters are not very similar to each other.

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1.

This function returns the Silhouette Coefficient for each sample.

The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters.

Read more in the User Guide.

Parameters
  • X (array-like of shape (n_samples_a, n_samples_a) if metric == "precomputed" or (n_samples_a, n_features) otherwise) – An array of pairwise distances between samples, or a feature array.

  • labels (array-like of shape (n_samples,)) – Label values for each sample.

  • metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances(). If X is the distance array itself, use “precomputed” as the metric. Precomputed distance matrices must have 0 along the diagonal.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns

silhouette – Silhouette Coefficients for each sample.

Return type

array-like of shape (n_samples,)

References

1

Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65.

2

Wikipedia entry on the Silhouette Coefficient

partycls.helpers.silhouette_score(X, labels, *, metric='euclidean', sample_size=None, random_state=None, **kwds)[source]

Compute the mean Silhouette Coefficient of all samples.

The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b - a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of. Note that Silhouette Coefficient is only defined if number of labels is 2 <= n_labels <= n_samples - 1.

This function returns the mean Silhouette Coefficient over all samples. To obtain the values for each sample, use silhouette_samples().

The best value is 1 and the worst value is -1. Values near 0 indicate overlapping clusters. Negative values generally indicate that a sample has been assigned to the wrong cluster, as a different cluster is more similar.

Read more in the User Guide.

Parameters
  • X (array-like of shape (n_samples_a, n_samples_a) if metric == "precomputed" or (n_samples_a, n_features) otherwise) – An array of pairwise distances between samples, or a feature array.

  • labels (array-like of shape (n_samples,)) – Predicted labels for each sample.

  • metric (str or callable, default='euclidean') – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by metrics.pairwise.pairwise_distances. If X is the distance array itself, use metric="precomputed".

  • sample_size (int, default=None) – The size of the sample to use when computing the Silhouette Coefficient on a random subset of the data. If sample_size is None, no sampling is used.

  • random_state (int, RandomState instance or None, default=None) – Determines random number generation for selecting a subset of samples. Used when sample_size is not None. Pass an int for reproducible results across multiple function calls. See Glossary.

  • **kwds (optional keyword parameters) – Any further parameters are passed directly to the distance function. If using a scipy.spatial.distance metric, the parameters are still metric dependent. See the scipy docs for usage examples.

Returns

silhouette – Mean Silhouette Coefficient for all samples.

Return type

float

References

1

Peter J. Rousseeuw (1987). “Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis”. Computational and Applied Mathematics 20: 53-65.

2

Wikipedia entry on the Silhouette Coefficient

partycls.helpers.show_matplotlib(system, color, view='top', palette=None, cmap='viridis', outfile=None, linewidth=0.5, alpha=1.0, show=False)[source]

Make a snapshot of the system using matplotlib. The figure is returned for further customization or visualization in jupyter notebooks.

Parameters
  • system (System) – The system to visualize.

  • color (str) – Particle property to use for color coding, e.g. "species", "label".

  • view (str, default: "top") – View type, i.e. face of the box to show. Only works for a 3D system.

  • palette (list, default: None) – List of colors when coloring particles according to a discrete property, such as "species" or "label". A default palette will be used if not specified.

  • cmap (str, default: "viridis") – Name of a matplotlib colormap to use when coloring particles according to a continuous property such as "velocity" or "energy". List of available colormaps can be found in matplotlib.cm.cmaps_listed.

  • outfile (str, default: None) – Output filename to save the snapshot. Default is to not save.

  • linewidth (float, default: 0.5) – Line width.

  • alpha (float, default: 1.0) – Transparency parameter.

  • show (bool, default: False) – Show the snapshot when calling the function.

Returns

fig – Figure of the snapshot.

Return type

matplotlib.figure.Figure

partycls.helpers.show_ovito(system, color, view='top', palette=None, cmap='viridis', outfile=None, size=(640, 480), zoom=True)[source]

Make a snapshot of the system using Ovito. The image is returned for further customization or visualization in jupyter notebooks.

Parameters
  • system (System) – The system to visualize.

  • color (str) – Particle property to use for color coding, e.g. "species", "label".

  • view (str, default: "top") – View type, i.e. face of the box to show. Only works for a 3D system.

  • palette (list, default: "viridis") – List of colors when coloring particles according to a discrete property, such as "species" or "label". Colors must be expressed in RGB format through tuples (e.g. palette=[(0,0,1), (1,0,0)]). A default palette will be used if not specified.

  • cmap (str, optional) – Name of a matplotlib colormap to use when coloring particles according to a continuous property such as "velocity" or “energy". List of available colormap can be found in matplotlib.cm.cmaps_listed.

  • outfile (str, default: None) – Output filename to save the snapshot. The default is to not save.

  • size (tuple, default: (640, 480)) – Size of the image to render.

  • zoom (bool, default: True) – Zoom on the simulation box.

Returns

Rendered image.

Return type

Image

partycls.helpers.show_3dmol(system, color, palette=None)[source]

Visualize the system using 3dmol. The py3Dmol view is returned for further customization or visualization in jupyter notebooks.

Parameters
  • system (System) – The system to visualize.

  • color (str) – Particle property to use for color coding, e.g. "species", "label". This property must be a string or an integer.

  • palette (list, default: None) – List of colors when coloring particles according to a discrete property, such as "species" or "label". A default palette will be used if not specified.

Raises

ValueError – If the color parameter refers to a float particle property.

Returns

view – py3Dmol view.

Return type

py3Dmol.view

partycls.helpers.shannon_entropy(px, dx=1.0)[source]

Shannon entropy of distribution \(p(x)\).

Parameters
  • px (numpy.array) – Distribution \(p(x)\).

  • dx (float, default: 1.0) – Differential of x.

Returns

S – Shannon entropy.

Return type

float

partycls.helpers.merge_clusters(weights, n_clusters_min=2, epsilon_=1e-15)[source]

Merge clusters into n_clusters_min new clusters based on the probabilities that particles initially belong to each of the original clusters with a certain probability and using an entropy criterion.

See https://doi.org/10.1198/jcgs.2010.08111 (Baudry et al.).

Parameters
  • weights (list) – Probabilities that each particle belongs to each cluster. If there are :math`N` particles, then the length of the list (or first dimension of the array) must be \(N\). If there are math:K original clusters, each element of weights (or the first dimension of the array) must be \(K\). weights[i][j] (list) or weights[i,k] (array) is the probability that particle i belongs to cluster k before merging. For each particle, sum(weights[i]) is equal to 1.

  • n_clusters_min (int, default: 2) – Final number of clusters after merging.

  • epsilon (float) – Small number (close to zero). This is needed as a replacement for zero when computing a logarithm to avoid errors.

Returns

  • new_weights (numpy.ndarray) – New weights after merging. Same shape and interpretation as the weights input parameter.

  • new_labels (list) – New discrete labels based on the weights after merging.

partycls.helpers.sort_clusters(labels, centroids, func=<function shannon_entropy>)[source]

Make a consistent labeling of the clusters based on their centroids by computing an associated numerical value as sorting criterion. By default, the labeling is based on the Shannon entropy of each cluster.

Parameters
  • labels (list) – Original labels.

  • centroids (numpy.ndarray) – Cluster centroids.

  • func (function, default: shannon_entropy) – Function used to associate a numerical value to each cluster, to be used as sorting criterion. This function must accept a list or a one dimensional array as parameter (this parameter being the coordinates of a given centroid).

Returns

  • new_labels (list) – New labels based on centroid entropies.

  • new_centroids (numpy.ndarray) – Centroids arranged in order of descending entropies.

partycls.particle module

Point particles in a cartesian reference frame.

This class is inspired by the framework atooms authored by Daniele Coslovich.

class partycls.particle.Particle(position=None, species='A', label=-1, radius=0.5, nearest_neighbors=None)[source]

Bases: object

A particle is defined by its position, its type, and additional attributes like a radius, a cluster label, a list of neighbors, etc.

position

The position of the particle.

Type

numpy.ndarray

species

Particle type / species.

Type

str

label

Cluster label of the particle.

Type

int

radius

Particle radius.

Type

float

nearest_neighbors

Zero-based indices of the particle’s nearest neighbors in the System.

Type

list

Parameters
  • position (list, default: None) – The position of the particle. If not given, it will be set to [0.0, 0.0, 0.0].

  • species (str, default: "A") – Particle type / species.

  • label (int, default: -1) – Cluster label of the particle. Default is -1 (i.e. not belonging to any cluster).

  • radius (float, defaut: 0.5) – Particle radius.

  • nearest_neighbors (list, default: None) – Indices of the particle’s nearest neighbors in the System.

Examples

>>> p = Particle([0.0, 0.0, 0.0], species='A', radius=0.4)
>>> p = Particle([1.5, -0.3, 3.2], species='B', nearest_neighbors=[12,34,68])
fold(cell)[source]

Fold the particle position into the central cell.

Parameters

cell (Cell) – Simulation cell.

Return type

None

partycls.system module

The physical system at hand.

The system of interest in a classical atomistic simulations is composed of interacting point particles, usually enclosed in a simulation cell.

This class is inspired by the framework atooms authored by Daniele Coslovich.

class partycls.system.System(particle=None, cell=None)[source]

Bases: object

A system is composed of a collection of particles that lie within an orthorhombic cell.

particle

All the particles in the system.

Type

list

cell

The cell where all the particles lie.

Type

Cell

nearest_neighbors_cutoffs

List of nearest neighbors cutoffs for each pair of species in the system.

Type

list

Parameters
  • particle (list, default: None) – A list of instances of Particle.

  • cell (Cell, default: None) – The cell (simulation box).

Examples

>>> p = [Particle(position=[0.0, 0.0, 0.0], species='A'),
        Particle(position=[1.0, 1.0, 1.0], species='B')]
>>> c = Cell([5.0, 5.0, 5.0])
>>> sys = System(particle=p, cell=c)
property nearest_neighbors_method

Method used to identify the nearest neighbors of all the particles in the system. Should be one of "fixed", "sann" or "voronoi".

property n_dimensions

Number of spatial dimensions, guessed from the length of self.particle[0].position.

property density

Number density of the system.

It will raise a ValueException if self.cell is None.

property distinct_species

Sorted numpy.ndarray of all the distinct species in the system.

property pairs_of_species

List of all the possible pairs of species.

property pairs_of_species_id

List of all the possible pairs of species ID.

property chemical_fractions

numpy.ndarray with the chemical fractions of each species in the system.

get_property(what, subset=None)[source]

Return a numpy.ndarray with the system property specified by what. If what is a particle property, return the property for all particles in the system, or for a given subset of particles specified by subset.

Parameters
  • what (str) –

    Requested system property. what must be of the form "particle.<attribute>" or "cell.<attribute>". The following particle aliases are accepted:

    • 'position' : 'particle.position'

    • 'pos' : 'particle.position'

    • 'position[0]' : 'particle.position[0]'

    • 'pos[0]' : 'particle.position[0]'

    • 'x' : 'particle.position[0]'

    • 'position[1]' : 'particle.position[1]'

    • 'pos[1]' : 'particle.position[1]'

    • 'y' : 'particle.position[1]'

    • 'position[2]' : 'particle.position[2]'

    • 'pos[2]' : 'particle.position[2]'

    • 'z' : 'particle.position[2]'

    • 'species' : 'particle.species'

    • 'spe' : 'particle.species'

    • 'label' : 'particle.label'

    • 'mass' : 'particle.mass'

    • 'radius' : 'particle.radius'

    • 'nearest_neighbors' : 'particle.nearest_neighbors'

    • 'neighbors' : particle.nearest_neighbors'

    • 'neighbours' : 'particle.nearest_neighbors'

    • 'voronoi_signature' : 'particle.voronoi_signature'

    • 'signature' : 'particle.voronoi_signature'

  • subset (str, default: None) – Subset of particles for which the property must be dumped. Must be of the form "particle.<attribute>" unless "<attribute>" is an alias. The default is None (all particles will be included). This is ignored if what is cell property.

Returns

to_dump – Array of the requested system property.

Return type

numpy.ndarray

Examples

>>> traj = Trajectory('trajectory.xyz')
>>> sys = traj[0]
>>> pos_0 = sys.get_property('position')
>>> spe_0 = sys.get_property('species')
>>> sides = sys.get_property('cell.side')
dump(what, subset=None)[source]

Alias for the method get_property.

set_property(what, value, subset=None)[source]

Set a system property what to value. If what is a particle property, set the property for all the particles in the system or for a given subset of particles specified by subset.

Parameters
  • what (str) – Name of the property to set. This is considered to be a particle property by default, unless it starts with “cell”, e.g. “cell.side”.

  • value (int, float, list, or numpy.ndarray) – Value(s) of the property to set. An instance of int or float will set the same value for all concerned particles. An instance of list or numpy.ndarray will assign a specific value to each particle. In this case, the size of value should respect the number of concerned particles.

  • subset (str, default: None) – Particles for which the property must be set. The default is None. This is ignored if what is cell property.

Return type

None

Examples

>>> sys.set_property('mass', 1.0)
>>> sys.set_property('radius', 0.5, "species == 'A'")
>>> labels = [0, 1, 0] # 3 particles in the subset
>>> sys.set_property('label', labels, "species == 'B'")
>>> sys.set_property('cell.side[0]', 2.0)
compute_nearest_neighbors(method, cutoffs)[source]

Compute the nearest neighbors for all the particles in the system using the provided method. Neighbors are stored in the nearest_neighbors particle property. Available methods are:

Parameters
  • method (str, default: None) – Method to identify the nearest neighbors. Must be one of 'fixed', 'sann', or 'voronoi'.

  • cutoffs (list) – List containing the cutoffs distances for each pair of species in the system (for method 'fixed' and 'sann'). For method 'sann', cutoffs are required as a first guess to identify the nearest neighbors. Leave None for method 'voronoi'.

Return type

None

Examples

>>> sys.compute_nearest_neighbors('fixed', [1.5, 1.4, 1.4, 1.3])
>>> sys.compute_nearest_neighbors('sann', [1.5, 1.4, 1.4, 1.3])
>>> sys.compute_nearest_neighbors('voronoi', None)
compute_voronoi_signatures()[source]

Compute the Voronoi signatures of all the particles in the system using the radical Voronoi tessellation method (see https://doi.org/10.1016/0022-3093(82)90093-X).

Particle radii must be set using the set_property method if the original trajectory file does not contain such information.

Creates a voronoi_signature property for the particles.

Return type

None

show(backend='matplotlib', color='species', **kwargs)[source]

Show a snapshot of the system and color particles according to an arbitrary property, such as species, cluster label, etc. Current visualization backends are "matplotlib", "ovito" and "3dmol".

Parameters
  • backend (str, default: "matplotlib") – Name of the backend to use for visualization.

  • color (str, default: "species") – Name of the particle property to use as basis for coloring the particles. This property must be defined for all the particles in the system.

  • **kwargs (additional keyworded arguments (backend-dependent).) –

Raises

ValueError – In case of unknown backend.

Return type

backend-dependent

Examples

>>> sys.show(frame=0, color='label', backend='3dmol')
>>> sys.show(frame=1, color='energy', backend='matplotlib', cmap='viridis')
fold()[source]

Fold the particles’ positions into the central cell.

Return type

None

partycls.trajectory module

Physical trajectory.

This class is inspired by the framework atooms authored by Daniele Coslovich.

class partycls.trajectory.Trajectory(filename, fmt=None, backend=None, top=None, additional_fields=None, first=0, last=None, step=1)[source]

Bases: object

A trajectory is composed by one or several frames, each frame being an instance of System. Trajectory instances are iterable. By default, only the positions and particle types are being read from the trajectory file. Additional particle properties in the file can be read using the additional_fields parameter.

filename

Name of the original trajectory file.

Type

str

fmt

Format of the original trajectory file.

Type

str

backend

Name of the third-party package used to read the input trajectory file.

Type

str

additional_fields

List of additional particle properties that were extracted from the original trajectory file.

Type

list

Parameters
  • filename (str) – Path to the trajectory file to read.

  • fmt (str, default: "xyz") – Format of the trajectory. Needed when using "atooms" as a backend.

  • backend (str, default: None) – Name of a third-party package to use as backend when reading the input trajectory. Currently supports "atooms" and "mdtraj".

  • top (str, mdtraj.Trajectory, or mdtraj.Topology, defaut: None) – Topology information. Needed when using "mdtraj" as backend on a trajectory file whose format requires topology information. See MDTraj documentation for more information.

  • additional_fields (list, optional, default: None) – Additional fields (i.e. particle properties) to read from the trajectory. Not all trajectory formats allow for additional fields.

  • first (int, default: 0) – Index of the first frame to consider in the trajectory. Starts at zero.

  • last (int, default: None) – Index of the last frame to consider in the trajectory. Default is the last frame.

  • step (int, default: 1) – Step between each frame to consider in the trajectory. For example, if step=2, one out of every two frames is read.

Examples

>>> traj = Trajectory('trajectory.xyz', additional_fields=['mass'])
>>> traj = Trajectory('trajectory.dat', fmt='lammps', backend='atooms')
property nearest_neighbors_method

Method used to identify the nearest neighbors of all the particles in the trajectory. Should be one of "auto", "fixed", "sann" or "voronoi".

property nearest_neighbors_cutoffs

List of cutoffs that delimit the first coordination shell. Cutoffs are usually defined on the basis of the first minimum of the partial radial distribution function of each pair of species, \(g_{\alpha\beta}(r)\). The list must have the same length as the number of pairs of species in the system (e.g. 2 species yield 4 possible pairs, 3 species yield 6 pairs, etc.).

remove(frame)[source]

Remove the system at position frame from the trajectory.

Parameters

frame (int) – Index of the frame to remove from the trajectory.

Return type

None

get_property(what, subset=None)[source]

Return a list of numpy.ndarrays with the system property specified by what. The list size is the number of systems in the trajectory.

Parameters
  • what (str) –

    Requested system property. what must be of the form "particle.<attribute>" or "cell.<attribute>". The following particle aliases are accepted:

    • 'position' : 'particle.position'

    • 'pos' : 'particle.position'

    • 'position[0]' : 'particle.position[0]'

    • 'pos[0]' : 'particle.position[0]'

    • 'x' : 'particle.position[0]'

    • 'position[1]' : 'particle.position[1]'

    • 'pos[1]' : 'particle.position[1]'

    • 'y' : 'particle.position[1]'

    • 'position[2]' : 'particle.position[2]'

    • 'pos[2]' : 'particle.position[2]'

    • 'z' : 'particle.position[2]'

    • 'species' : 'particle.species'

    • 'spe' : 'particle.species'

    • 'label' : 'particle.label'

    • 'mass' : 'particle.mass'

    • 'radius' : 'particle.radius'

    • 'nearest_neighbors' : 'particle.nearest_neighbors'

    • 'neighbors' : particle.nearest_neighbors'

    • 'neighbours' : 'particle.nearest_neighbors'

    • 'voronoi_signature' : 'particle.voronoi_signature'

    • 'signature' : 'particle.voronoi_signature'

  • subset (str, optional, default: None) – Subset of particles for which the property must be dumped. Must be of the form "particle.<attribute>" unless "<attribute>" is an alias. The default is None (all particles will be included). This is ignored if `what` is cell property.

Returns

to_dump – List of the requested system property with length equal to the number of frames in the trajectory. Each element of the list is a numpy.ndarray of the requested system property.

Return type

list

Examples

>>> traj = Trajectory('trajectory.xyz')
>>> pos = traj.get_property('position')
>>> spe = traj.get_property('species')
>>> sides = traj.get_property('cell.side')
dump(what, subset=None)[source]

Alias for the method get_property.

set_property(what, value, subset=None)[source]

Set a property what to value for all the particles in the trajectory or for a given subset of particles specified by subset.

Parameters
  • what (str) – Name of the property to set. This is considered to be a particle property by default, unless it starts with "cell", e.g. "cell.side".

  • value (int, float, list, numpy.ndarray) – Value(s) of the property to set. An instance of int or float will set the same value for all concerned particles. An instance of list or numpy.ndarray will assign a specific value to each particle. In this case, the shape of value should respect the number of frames in the trajectory and the number of concerned particles.

  • subset (str, default: None) – Particles to which the property must be set. The default is None. This is ignored if what is a cell property.

Return type

None

Examples

>>> traj.set_property('mass', 1.0)
>>> traj.set_property('radius', 0.5, subset="species == 'A'")
>>> labels = [[0, 1, 0], # 2 frames, 3 particles in the subset
              [1, 1, 0]]
>>> traj.set_property('label', labels, subset="species == 'B'")
compute_nearest_neighbors(method=None, cutoffs=None, dr=0.1)[source]

Compute the nearest neighbors for all the particles in the trajectory using the provided method. Neighbors are stored in the nearest_neighbors particle property. Available methods are:

  • 'auto' : read neighbors from the trajectory file, if explicitly requested with the additional_fields argument in the constructor.

  • 'fixed' : use fixed cutoffs for each pair of species in the trajectory.

  • 'sann' : solid-angle based nearest neighbor algorithm (see https://doi.org/10.1063/1.4729313).

  • 'voronoi' : radical Voronoi tessellation method (uses particles’ radii) (see https://doi.org/10.1016/0022-3093(82)90093-X)

Parameters
  • method (str, default: None) – Method to identify the nearest neighbors. Must be one of 'auto', 'fixed', 'sann', or 'voronoi'. None defaults to 'auto'. If method is 'auto', neighbors are read directly from the trajectory file, if specified with the additional_fields argument in the constructor. If no neighbors are found, falls back to method='fixed' instead.

  • cutoffs (list, default: None) – List containing the cutoffs distances for each pair of species in the trajectory (for method 'fixed' and 'sann'). If None, cutoffs will be computed automatically. For method 'sann', cutoffs are required as a first guess to identify the nearest neighbors.

  • dr (float, default: 0.1) – Radial grid spacing \(\Delta r\) for computing the cutoffs on the basis of the first minimum of each partial radial distribution function in the trajectory, if cutoffs are not provided.

Return type

None

Examples

>>> traj.compute_nearest_neighbors(method='fixed', cutoffs=[1.5, 1.4, 1.4, 1.3])
>>> traj.compute_nearest_neighbors(method='sann', cutoffs=[1.5, 1.4, 1.4, 1.3])
>>> traj.compute_nearest_neighbors(method='voronoi')
set_nearest_neighbors_cutoff(s_a, s_b, rcut, mirror=True)[source]

Set the nearest-neighbor cutoff for the pair of species (s1, s2). The cutoff of the mirror pair (s2, s1) is set automatically if the mirror parameter is True (default). Writes in the nearest_neighbors_cutoffs list attribute.

Parameters
  • s_a (str) – Symbol of the first species \(\alpha\).

  • s_b (str) – Symbol of the second species \(\beta\).

  • rcut (float) – Value of the cutoff for the pair \((\alpha,\beta) r\) = (s_a, s_b).

  • mirror (bool, default: None) – Set the cutoff for the mirror pair (s_a, s_b). The default is True.

Return type

None

compute_nearest_neighbors_cutoffs(dr=0.1)[source]

Compute the nearest neighbors cutoffs on the basis of the first minimum of the partial radial distribution function \(g_{\alpha\beta}(r)\) between each pair of species \((\alpha,\beta)\) in the trajectory. Sets the nearest_neighbors_cutoffs list attribute.

Parameters

dr (float, default: 0.1) – Bin width \(\Delta r\) for the radial grid used to compute the partial radial distribution functions \(g_{\alpha\beta}(r)\).

Return type

None

compute_voronoi_signatures()[source]

Compute the Voronoi signatures of all the particles in the trajectory using the radical Voronoi tessellation method (see https://doi.org/10.1016/0022-3093(82)90093-X).).

Particle radii must be set using the set_property method if the original trajectory file does not contain such information.

Creates a voronoi_signature property for the particles.

Return type

None

show(frames=None, backend='matplotlib', color='species', **kwargs)[source]

Show the frames on index frames of the trajectory and color particles according to an arbitrary property, such as species, cluster label, etc. Current visualization backends are "matplotlib", "ovito", and "3dmol".

Parameters
  • frames (list, default: None) – Indices of the frames to show. The default is None (shows all frames).

  • backend (str, default: "matplotlib") – Name of the backend to use for visualization.

  • color (str, default: "species") – Name of the particle property to use as basis for coloring the particles. This property must be defined for all the particles in the system.

  • **kwargs (additional keyworded arguments (backend-dependent).) –

Raises

ValueError – In case of unknown backend.

Return type

Backend-dependent

Examples

>>> traj.show(frames=[0,1,2], color='label', backend='3dmol')
>>> traj.show(frames=[0,1], color='energy', backend='matplotlib', cmap='viridis')
>>> traj[0].show() # use the iterability of Trajectory objects
write(output_path, fmt='xyz', backend=None, additional_fields=None, precision=6)[source]

Write the current trajectory to a file.

Parameters
  • output_path (str) – Name of the output trajectory file.

  • fmt (str, default: "xyz") – Format of the output trajectory file.

  • backend (str, default: None) – Name of a third-party package to use when writing the output trajectory.

  • additional_fields (list, default: None) – Additional fields (i.e. particle properties) to write in the output trajectory. Not all trajectory formats allow for additional fields. The default is to not write any additional particle property.

  • precision (int, default: 6) – Number of decimals when writing the output trajectory.

Raises

ValueError

  • If backend=None and fmt is not recognized natively. - If backend is unknown.

Return type

None

fold()[source]

Fold the particles’ positions into the central cell.

Return type

None

partycls.workflow module

Workflow for clustering analysis.

A workflow is a procedure that goes through various steps (some of which are optional) to perform a structural clustering on a trajectory.

class partycls.workflow.Workflow(trajectory, descriptor='gr', scaling=None, dim_reduction=None, clustering='kmeans')[source]

Bases: object

A Workflow is a clustering procedure that goes through the following steps:

  • compute a structural descriptor on a given trajectory ;

  • (optional) apply a feature scaling on the previously computed structural features ;

  • (optional) apply a dimensionality reduction on the (raw/scaled) features ;

  • run a clustering algorithm to partition particles into structurally different clusters ;

trajectory

The trajectory file as read by the Trajectory class.

Type

Trajectory

descriptor

Structural descriptor associated to the trajectory.

Type

StructuralDescriptor

scaling

Feature scaling method.

Type

ZScore, MinMax, MaxAbs or Robust

dim_reduction

Dimensionality reduction method.

Type

PCA, TSNE, LocallyLinearEmbedding or AutoEncoder

clustering

Clustering method.

Type

Clustering

output_metadata

Dictionnary that controls the writing process and the properties of all the output files.

Type

dict

features

Raw features as computed by the associated structural descriptor. Initial value is None if features were not computed.

Type

numpy.ndarray

scaled_features

Features after being rescaled by a feature scaling method. Equal to None if no scaling is applied to the features.

Type

numpy.ndarray

reduced_features

Features in the reduced space after applying a dimensionality reduction technique. Equal to None if no reduction is applied to the features.

Type

numpy.ndarray

naming_convention

Base name for output files. Default is "{filename}.{code}.{descriptor}.{clustering}", where each tag will be replaced by its value in the current instance of Workflow (e.g. "traj.xyz.partycls.gr.kmeans").

Base name can be changed using any combination of the available tags:

  • {filename}

  • {code}

  • {descriptor}

  • {scaling}

  • {dim_reduction}

  • {clustering}

Example: "{filename}_descriptor-{descriptor}_scaling-{scaling}.{code}".

Type

str

Parameters
  • trajectory (Trajectory) – An instance of Trajectory or a path to trajectory file to read, or an instance of a class with compatible interface.

  • descriptor (StructuralDescriptor, default: "gr") –

    An instance of StructuralDescriptor, the short name of a descriptor (str), or an instance of a class with compatible interface. See the descriptor_db class attribute for compatible strings. Examples:

    • "gr" : radial distribution of particles around a central particle.

    • "ba" : angular distribution of pairs of nearest neighbors of a central particle.

    • "bo" : Steinhardt bond-orientational order parameter.

    • "ld" : Lechner-Dellago cond-orientational order parameter.

  • scaling (method, default: None) –

    Feature scaling method. See the scaling_db class attribute for compatible strings. Examples:

    • "zscore" : standardize features by removing the mean and scaling to unit variance

    • "minmax" : scale and translate each feature individually such that it is in the given range on the training set, e.g. between zero and one

    • "maxabs" : scale and translate each feature individually such that the maximal absolute value of each feature in the training set will be 1.

    • "robust" : remove the median and scale the data according to the specified quantile range (default is between 25th quantile and 75th quantile)

  • dim_reduction (method, default: None) –

    Dimensionality reduction method. See the dim_reduction_db class attribute for compatible strings. Examples:

    • "pca" : Principal Component Analysis

    • "tsne" : t-distributed Stochastic Neighbor Embedding

    • "lle" : Locally Linear Embedding

    • "ae" : neural network Auto-Encoder

  • clustering (Clustering, default: 'kmeans') –

    Clustering algorithm. See the clustering_db class attribute for compatible strings. Examples:

Example

>>> wf = Workflow('trajectory.xyz', descriptor='ba', scaling='zscore')
>>> wf.run()
descriptor_db = {'ang': <class 'partycls.descriptors.ba.BondAngleDescriptor'>, 'angular': <class 'partycls.descriptors.ba.BondAngleDescriptor'>, 'ba': <class 'partycls.descriptors.ba.BondAngleDescriptor'>, 'bo': <class 'partycls.descriptors.bo.BondOrientationalDescriptor'>, 'boattini': <class 'partycls.descriptors.radial_bo.BoattiniDescriptor'>, 'boo': <class 'partycls.descriptors.bo.BondOrientationalDescriptor'>, 'bop': <class 'partycls.descriptors.bo.BondOrientationalDescriptor'>, 'compact': <class 'partycls.descriptors.compactness.CompactnessDescriptor'>, 'coord': <class 'partycls.descriptors.coordination.CoordinationDescriptor'>, 'coordination': <class 'partycls.descriptors.coordination.CoordinationDescriptor'>, 'gr': <class 'partycls.descriptors.radial.RadialDescriptor'>, 'labo': <class 'partycls.descriptors.averaged_bo.LocallyAveragedBondOrientationalDescriptor'>, 'ld': <class 'partycls.descriptors.averaged_bo.LechnerDellagoDescriptor'>, 'lechner dellago': <class 'partycls.descriptors.averaged_bo.LechnerDellagoDescriptor'>, 'lechner-dellago': <class 'partycls.descriptors.averaged_bo.LechnerDellagoDescriptor'>, 'rad': <class 'partycls.descriptors.radial.RadialDescriptor'>, 'radial': <class 'partycls.descriptors.radial.RadialDescriptor'>, 'rbo': <class 'partycls.descriptors.radial_bo.RadialBondOrientationalDescriptor'>, 'rboo': <class 'partycls.descriptors.radial_bo.RadialBondOrientationalDescriptor'>, 'rbop': <class 'partycls.descriptors.radial_bo.RadialBondOrientationalDescriptor'>, 'sbo': <class 'partycls.descriptors.smoothed_bo.SmoothedBondOrientationalDescriptor'>, 'sboo': <class 'partycls.descriptors.smoothed_bo.SmoothedBondOrientationalDescriptor'>, 'sbop': <class 'partycls.descriptors.smoothed_bo.SmoothedBondOrientationalDescriptor'>, 'steinhardt': <class 'partycls.descriptors.bo.SteinhardtDescriptor'>, 'tetra': <class 'partycls.descriptors.tetrahedrality.TetrahedralDescriptor'>, 'tong tanaka': <class 'partycls.descriptors.compactness.TongTanakaDescriptor'>, 'tong-tanaka': <class 'partycls.descriptors.compactness.TongTanakaDescriptor'>}
clustering_db = {'cinf': <class 'partycls.clustering.CommunityInference'>, 'community inference': <class 'partycls.clustering.CommunityInference'>, 'community-inference': <class 'partycls.clustering.CommunityInference'>, 'gaussian mixture': <class 'partycls.clustering.GaussianMixture'>, 'gaussian-mixture': <class 'partycls.clustering.GaussianMixture'>, 'gm': <class 'partycls.clustering.GaussianMixture'>, 'gmm': <class 'partycls.clustering.GaussianMixture'>, 'inference': <class 'partycls.clustering.CommunityInference'>, 'k-means': <class 'partycls.clustering.KMeans'>, 'kmeans': <class 'partycls.clustering.KMeans'>}
scaling_db = {'max-abs': <class 'partycls.feature_scaling.MaxAbs'>, 'maxabs': <class 'partycls.feature_scaling.MaxAbs'>, 'min-max': <class 'partycls.feature_scaling.MinMax'>, 'minmax': <class 'partycls.feature_scaling.MinMax'>, 'robust': <class 'partycls.feature_scaling.Robust'>, 'standard': <class 'partycls.feature_scaling.ZScore'>, 'z-score': <class 'partycls.feature_scaling.ZScore'>, 'zscore': <class 'partycls.feature_scaling.ZScore'>}
dim_reduction_db = {'ae': <class 'partycls.dim_reduction.AutoEncoder'>, 'auto-encoder': <class 'partycls.dim_reduction.AutoEncoder'>, 'autoencoder': <class 'partycls.dim_reduction.AutoEncoder'>, 'lle': <class 'partycls.dim_reduction.LocallyLinearEmbedding'>, 'pca': <class 'partycls.dim_reduction.PCA'>, 't-sne': <class 'partycls.dim_reduction.TSNE'>, 'tsne': <class 'partycls.dim_reduction.TSNE'>}
property labels

Clustering labels.

property fractions

Fraction of particles in each cluster.

property populations

Number of particles in each cluster.

property centroids

Centroid of each cluster.

run()[source]

Compute the clustering and write the output files according to the defined Workflow :

  • compute the descriptor

  • (optional) apply feature scaling

  • (optional) apply dimensionality reduction

  • compute the clustering

  • (optional) write the output files

Raises

ValueError – If a community inference clustering is attempted with feature scaling or dimensionality reduction.

Return type

None

set_output_metadata(what, **kwargs)[source]

Change the output properties.

Parameters
  • what (str) –

    Type of output file to change. Must be one of:

    • "trajectory"

    • "log"

    • "centroids"

    • "labels"

    • "dataset"

  • **kwargs – Keywords arguments (specific to each type of file)

Return type

None

Examples

>>> wf = Workflow('trajectory.xyz')
>>> wf.set_output_metadata('log', enable=False) # do not write the log file
>>> wf.set_output_metadata('trajectory', filename='awesome_trajectory.xyz') # change the default output name
>>> wf.run('dataset', enable=True, precision=8) # write the dataset and change the writing precision to 8 digits
disable_output()[source]

Disable all outputs.

Return type

None

write_trajectory(filename=None, fmt='xyz', backend=None, additional_fields=None, precision=6, **kwargs)[source]

Write the trajectory file with cluster labels (default) and other additional fields (if any).

Parameters
  • filename (str, default: None) – Filename of the output trajectory. Uses a default naming convention if not specified. The default is None.

  • fmt (str, default: "xyz") – Output trajectory format.

  • backend (str, default: None) – Name of the backend to use to write the trajectory. Must be either None, "atooms" or "mdtraj".

  • additional_fields (list, default: None) – Additional fields (i.e. particle properties) to write in the output trajectory. Note that all the Particle objects should have the specified properties as attributes.

  • precision (int, default: 6) – Number of decimals when writing the output trajectory.

Return type

None

Examples

>>> wf = Workflow('trajectory.xyz')
>>> wf.write_trajectory(fmt='rumd')
>>> wf.write_trajectory(additional_field=['particle.mass']) # `Particle` must have the attribute `mass`.
>>> wf.write_trajectory(filename='my_custom_name', precision=8)
write_log(filename=None, precision=6, **kwargs)[source]

Write a log file with all relevant information about the workflow. The log file can be written only if the workflow has been run at least once with the method Workflow.run.

Parameters
  • filename (str, default: None) – Filename of the log file. Uses a default naming convention if not specified.

  • precision (int, default: 6) – Number of decimals when writing the log file.

Return type

None

write_centroids(filename=None, precision=6, **kwargs)[source]

Write the coordinates of the clusters’ centroids using the raw features from the descriptor (i.e. nor scaled or reduced).

Parameters
  • filename (str, default: None) – Filename of the centroids file. Uses a default naming convention if not specified.

  • precision (int, default: 6) – Number of decimals when writing the centroids file.

Return type

None

write_labels(filename=None, **kwargs)[source]

Write the clusters’ labels only.

Parameters

filename (str, default: None) – Filename of the labels file. Uses a default naming convention if not specified.

Return type

None

write_dataset(filename=None, precision=6, **kwargs)[source]

Write the full raw dataset from the descriptor as an array (i.e. all the individual raw features of each particle).

Parameters
  • filename (str, default: None) – Filename of the dataset file. Uses a default naming convention if not specified.

  • precision (int, default: 6) – Number of decimals when writing the dataset file.

Return type

None