mat_discover.utils package

Submodules

mat_discover.utils.Timer module

Timer class.

class mat_discover.utils.Timer.NoTimer(name)[source]

Bases: object

Use in place of Timer without actually printing output.

__init__(name)[source]

Take name as argument and do nothing.

class mat_discover.utils.Timer.Timer(name=None)[source]

Bases: object

Simple timer class.

https://stackoverflow.com/a/5849861/13697228

Usage

>>> with Timer("add two numbers"):
...    out = 56 + 74
__init__(name=None)[source]

Assign name for Timer object.

mat_discover.utils.generate_elasticity_data module

Download and partition elasticity data using Materials Project API.

mat_discover.utils.generate_elasticity_data.generate_elasticity_data(download_data=True, cif=False, train_e_above_hull=0.05, val_e_above_hull=0.05, theoretical=False, folder='mat_discover/data/elasticity')[source]

Download (or reload) elasticity data using MPRester.

Parameters
  • download_data (bool, optional) – [description], by default True

  • cif (bool, optional) – [description], by default False

  • train_e_above_hull (float, optional) – [description], by default 0.5

  • val_e_above_hull (float, optional) – [description], by default 0.05

  • theoretical (bool, optional) – Whether a compound is theoretical or not. False means experimental compounds, API subject to change. Can take on values False, True, None, or a list of the previous. by default False. See https://matsci.org/t/how-to-use-has-icsd-exptl-id-property-in-pymatgen-query-function/2550/4

  • folder (str, optional) – Which folder to save to, by default join(“mat_discover”, “data”, “elasticity”).

Returns

[description]

Return type

[type]

mat_discover.utils.generate_elasticity_data.structure_from_cif(cif)[source]

Create pymatgen Structure from a crystallographic information file str.

mat_discover.utils.nearest_neigh module

Nearest neighbor helper functions for DISCOVER.

mat_discover.utils.nearest_neigh.nearest_neigh_props(X, target, r_strength=None, radius=None, n_neighbors=10, metric='precomputed', **NN_kwargs)[source]

Compute nearest neighbor properties for peak proxy using radius and kNN.

Parameters
  • X (2d array) – Pairwise distance matrix (within single set).

  • target (1d array) – Target property values.

  • r_strength (float or None, optional) – Radius strength used as a scaling value for radius, by default None. If None, then a default value based on mean and standard deviation is used. See _nearest_neigh_props.

  • radius (float, optional) – The radius within which to consider nearest neighbors, by default None

  • n_neighbors (int, optional) – The number of nearest neighbors (kNNs) to consider for computing k_neigh_avg_targ, by default 10.

  • metric (str or callable) – “The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.” (source: sklearn.neighbors.NearestNeighbors docs). By default “precomputed”.

Returns

rad_neigh_avg_targ, k_neigh_avg_targ – Radius- and kNN-based average of neighbor targets, respectively.

Return type

1d array (X.shape[0],)

See also

sklearn.neighbors.NearestNeighbors

Unsupervised learner for implementing neighbor

searches.

mat_discover.utils.pareto module

Helper functions for finding and plotting a pareto front.

mat_discover.utils.pareto.get_pareto_ind(proxy, target, reverse_x=True)[source]

Get Pareto front indices.

Parameters
  • proxy (1d array) – Chemical uniqueness proxy values (x-axis).

  • target (1d array) – Target property (i.e. performance) values (y-axis).

  • reverse_x (bool, optional) – Whether to flip the x direction (i.e. Pareto front seeks maximization of target and minimization of proxy), by default True

Returns

pareto_ind – Pareto front indices.

Return type

2d array

mat_discover.utils.pareto.is_pareto_efficient_simple(costs)[source]

Find the pareto-efficient points.

Parameters

costs – An (n_points, n_costs) array

Returns

A (n_points, ) boolean array, indicating whether each point is Pareto efficient

Fairly fast for many datapoints, less fast for many costs, somewhat readable

Modified from: https://stackoverflow.com/a/40239615/13697228

mat_discover.utils.pareto.pareto_plot(df, x='neigh_avg_targ', y='target', color='Peak height', x_unit=None, y_unit=None, color_unit=None, hover_data=['formula'], fpath='figures/pareto-front', reverse_x=True, parity_type='max-of-both', pareto_front=True, color_continuous_scale=None, color_discrete_map=None, xrange=None, use_plotly_offline: bool = True)[source]

Generate and save pareto plot for two variables.

Parameters
  • df (DataFrame) – Contains relevant variables for pareto plot.

  • x (str, optional) – Name of df column to use for x-axis, by default “proxy”

  • y (str, optional) – Name of df column to use for y-axis, by default “target”

  • color (str, optional) – Name of df column to use for colors, by default “Peak height”

  • hover_data (list of str, optional) – Name(s) of df columns to display on hover, by default [“formulas”]

  • fpath (str, optional) – Filepath to which to save HTML and PNG. Specify as None if no saving is desired, by default “pareto-plot”

  • reverse_x (bool, optional) – Whether to reverse the x-axis (i.e. for maximize y and minimize x front)

  • parity_type (str, optional) – What kind of parity line to plot: “max-of-both”, “max-of-each”, or “none”

  • use_plotly_offline (bool) – Whether to use offline.plot(fig) instead of fig.show(). Set to False for Google Colab. By default, True.

mat_discover.utils.plotting module

Various plotting functions for cluster properties and UMAP visualization.

mat_discover.utils.plotting.cluster_count_hist(labels, figure_dir='figures')[source]

Plot histogram of cluster counts, colored by cluster IDs.

Parameters

labels (1d array) – Cluster IDs.

Returns

fig – Handle to Matplotlib Figure.

Return type

Figure

mat_discover.utils.plotting.dens_scatter(x, y, pdf_sum, figure_dir='figures')[source]

Plot DensMAP densities at the x and y embedding coordinates.

Parameters
  • x (1d array) – x-coordinates

  • y (1d array) – y-coordinates

  • pdf_sum (1d array) – probabilities evaluated at each of the x and y coordinate pairs.

Returns

fig – Handle to Matplotlib Figure.

Return type

Figure

See also

mat_discover_.mvn_prob_sum

used to obtain x, y, and pdf_sum

mat_discover.utils.plotting.dens_targ_scatter(std_emb, target, x, y, pdf_sum, figure_dir='figures')[source]

Plot overlay of density scatter and target scatter plots.

Parameters
  • std_emb (2d array) – UMAP embedding coordinates.

  • target (1d array) – Target properties corresponding to std_emb.

  • x (1d array) – x-coordinates

  • y (1d array) – y-coordinates

  • pdf_sum (1d array) – probabilities evaluated at each of the x and y coordinate pairs.

Returns

fig – Handle to Matplotlib Figure.

Return type

Figure

See also

dens_scatter

density scatter plot

targ_scatter

target scatter plot

mat_discover.utils.plotting.group_cv_parity(ytrue, ypred, labels, figure_dir='figures')[source]

Leave-one-cluster-out cross-validation parity plot colored by labels.

Parameters
  • ytrue (1d array) – True target values.

  • ypred (1d array) – Predicted target values.

  • labels (1d array) – Cluster IDs.

Returns

fig – Handle to Matplotlib Figure.

Return type

Figure

mat_discover.utils.plotting.matplotlibify(fig, size=24, width_inches=3.5, height_inches=3.5, dpi=142)[source]
mat_discover.utils.plotting.target_scatter(std_emb, target, figure_dir='figures', color_unit=None)[source]

Plot UMAP embedding locations colored by target values.

Parameters
  • std_emb (2d array) – UMAP embedding coordinates.

  • target (1d array) – Target properties corresponding to std_emb.

Returns

fig – Handle to Matplotlib Figure.

Return type

Figure

mat_discover.utils.plotting.umap_cluster_scatter(std_emb, labels, figure_dir='figures')[source]

Plot UMAP embeddings colored by cluster IDs.

Parameters
  • std_emb (2d array) – UMAP embedded coordinates.

  • labels (1d array) – Cluster IDs associated with the UMAP coordinates.

Returns

fig – Handle to Matplotlib Figure.

Return type

Figure

Module contents