mat_discover.utils package

Submodules

mat_discover.utils.Timer module

Timer class.

class mat_discover.utils.Timer.NoTimer(name)[source]

Bases: object

Use in place of Timer without actually printing output.

__init__(name)[source]: Take name as argument and do nothing.

class mat_discover.utils.Timer.Timer(name=None)[source]

Bases: object

Simple timer class.

https://stackoverflow.com/a/5849861/13697228

Usage

>>> with Timer("add two numbers"):
...    out = 56 + 74

__init__(name=None)[source]: Assign name for Timer object.

mat_discover.utils.generate_elasticity_data module

Download and partition elasticity data using Materials Project API.

mat_discover.utils.generate_elasticity_data.generate_elasticity_data(download_data=True, cif=False, train_e_above_hull=0.05, val_e_above_hull=0.05, theoretical=False, folder='mat_discover/data/elasticity')[source]

Download (or reload) elasticity data using MPRester.

Parameters

download_data (bool, optional) – [description], by default True
cif (bool, optional) – [description], by default False
train_e_above_hull (float, optional) – [description], by default 0.5
val_e_above_hull (float, optional) – [description], by default 0.05
theoretical (bool, optional) – Whether a compound is theoretical or not. False means experimental compounds, API subject to change. Can take on values False, True, None, or a list of the previous. by default False. See https://matsci.org/t/how-to-use-has-icsd-exptl-id-property-in-pymatgen-query-function/2550/4
folder (str, optional) – Which folder to save to, by default join(“mat_discover”, “data”, “elasticity”).

Returns

[description]

Return type

[type]

mat_discover.utils.generate_elasticity_data.structure_from_cif(cif)[source]: Create pymatgen Structure from a crystallographic information file str.

mat_discover.utils.nearest_neigh module

Nearest neighbor helper functions for DISCOVER.

mat_discover.utils.nearest_neigh.nearest_neigh_props(X, target, r_strength=None, radius=None, n_neighbors=10, metric='precomputed', **NN_kwargs)[source]

Compute nearest neighbor properties for peak proxy using radius and kNN.

Parameters

X (2d array) – Pairwise distance matrix (within single set).
target (1d array) – Target property values.
r_strength (float or None, optional) – Radius strength used as a scaling value for radius, by default None. If None, then a default value based on mean and standard deviation is used. See _nearest_neigh_props.
radius (float, optional) – The radius within which to consider nearest neighbors, by default None
n_neighbors (int, optional) – The number of nearest neighbors (kNNs) to consider for computing k_neigh_avg_targ, by default 10.
metric (str or callable) – “The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.” (source: sklearn.neighbors.NearestNeighbors docs). By default “precomputed”.

Returns

rad_neigh_avg_targ, k_neigh_avg_targ – Radius- and kNN-based average of neighbor targets, respectively.

Return type

1d array (X.shape[0],)

mat_discover.utils.pareto module

Helper functions for finding and plotting a pareto front.

mat_discover.utils.pareto.get_pareto_ind(proxy, target, reverse_x=True)[source]

Get Pareto front indices.

Parameters

proxy (1d array) – Chemical uniqueness proxy values (x-axis).
target (1d array) – Target property (i.e. performance) values (y-axis).
reverse_x (bool, optional) – Whether to flip the x direction (i.e. Pareto front seeks maximization of target and minimization of proxy), by default True

Returns

pareto_ind – Pareto front indices.

Return type

2d array

mat_discover.utils.pareto.is_pareto_efficient_simple(costs)[source]

Find the pareto-efficient points.

Parameters: costs – An (n_points, n_costs) array
Returns: A (n_points, ) boolean array, indicating whether each point is Pareto efficient

Fairly fast for many datapoints, less fast for many costs, somewhat readable

Modified from: https://stackoverflow.com/a/40239615/13697228

mat_discover.utils.pareto.pareto_plot(df, x='neigh_avg_targ', y='target', color='Peak height', x_unit=None, y_unit=None, color_unit=None, hover_data=['formula'], fpath='figures/pareto-front', reverse_x=True, parity_type='max-of-both', pareto_front=True, color_continuous_scale=None, color_discrete_map=None, xrange=None, use_plotly_offline: bool = True)[source]

Generate and save pareto plot for two variables.

Parameters

df (DataFrame) – Contains relevant variables for pareto plot.
x (str, optional) – Name of df column to use for x-axis, by default “proxy”
y (str, optional) – Name of df column to use for y-axis, by default “target”
color (str, optional) – Name of df column to use for colors, by default “Peak height”
hover_data (list of str, optional) – Name(s) of df columns to display on hover, by default [“formula”], e.g., could also be [“structure”]
fpath (str, optional) – Filepath to which to save HTML and PNG. Specify as None if no saving is desired, by default “pareto-plot”
reverse_x (bool, optional) – Whether to reverse the x-axis (i.e. for maximize y and minimize x front)
parity_type (str, optional) – What kind of parity line to plot: “max-of-both”, “max-of-each”, or “none”
use_plotly_offline (bool) – Whether to use offline.plot(fig) instead of fig.show(). Set to False for Google Colab. By default, True.

mat_discover.utils.plotting module

Various plotting functions for cluster properties and UMAP visualization.

mat_discover.utils.plotting.cluster_count_hist(labels, figure_dir='figures')[source]

Plot histogram of cluster counts, colored by cluster IDs.

Parameters: labels (1d array) – Cluster IDs.
Returns: fig – Handle to Matplotlib Figure.
Return type: Figure

mat_discover.utils.plotting.dens_scatter(x, y, pdf_sum, figure_dir='figures')[source]

Plot DensMAP densities at the x and y embedding coordinates.

Parameters

x (1d array) – x-coordinates
y (1d array) – y-coordinates
pdf_sum (1d array) – probabilities evaluated at each of the x and y coordinate pairs.

Returns

fig – Handle to Matplotlib Figure.

Return type

Figure

mat_discover.utils package

Submodules

mat_discover.utils.Timer module

Usage

mat_discover.utils.generate_elasticity_data module

mat_discover.utils.nearest_neigh module

mat_discover.utils.pareto module

mat_discover.utils.plotting module

Module contents