mat_discover.utils package
Submodules
mat_discover.utils.Timer module
Timer class.
mat_discover.utils.generate_elasticity_data module
Download and partition elasticity data using Materials Project API.
- mat_discover.utils.generate_elasticity_data.generate_elasticity_data(download_data=True, cif=False, train_e_above_hull=0.05, val_e_above_hull=0.05, theoretical=False, folder='mat_discover/data/elasticity')[source]
Download (or reload) elasticity data using MPRester.
- Parameters
download_data (bool, optional) – [description], by default True
cif (bool, optional) – [description], by default False
train_e_above_hull (float, optional) – [description], by default 0.5
val_e_above_hull (float, optional) – [description], by default 0.05
theoretical (bool, optional) – Whether a compound is theoretical or not. False means experimental compounds, API subject to change. Can take on values False, True, None, or a list of the previous. by default False. See https://matsci.org/t/how-to-use-has-icsd-exptl-id-property-in-pymatgen-query-function/2550/4
folder (str, optional) – Which folder to save to, by default join(“mat_discover”, “data”, “elasticity”).
- Returns
[description]
- Return type
[type]
mat_discover.utils.nearest_neigh module
Nearest neighbor helper functions for DISCOVER.
- mat_discover.utils.nearest_neigh.nearest_neigh_props(X, target, r_strength=None, radius=None, n_neighbors=10, metric='precomputed', **NN_kwargs)[source]
Compute nearest neighbor properties for peak proxy using radius and kNN.
- Parameters
X (2d array) – Pairwise distance matrix (within single set).
target (1d array) – Target property values.
r_strength (float or None, optional) – Radius strength used as a scaling value for radius, by default None. If None, then a default value based on mean and standard deviation is used. See _nearest_neigh_props.
radius (float, optional) – The radius within which to consider nearest neighbors, by default None
n_neighbors (int, optional) – The number of nearest neighbors (kNNs) to consider for computing k_neigh_avg_targ, by default 10.
metric (str or callable) – “The distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.” (source: sklearn.neighbors.NearestNeighbors docs). By default “precomputed”.
- Returns
rad_neigh_avg_targ, k_neigh_avg_targ – Radius- and kNN-based average of neighbor targets, respectively.
- Return type
1d array (X.shape[0],)
See also
sklearn.neighbors.NearestNeighbors
Unsupervised learner for implementing neighbor
searches.
mat_discover.utils.pareto module
Helper functions for finding and plotting a pareto front.
- mat_discover.utils.pareto.get_pareto_ind(proxy, target, reverse_x=True)[source]
Get Pareto front indices.
- Parameters
proxy (1d array) – Chemical uniqueness proxy values (x-axis).
target (1d array) – Target property (i.e. performance) values (y-axis).
reverse_x (bool, optional) – Whether to flip the x direction (i.e. Pareto front seeks maximization of target and minimization of proxy), by default True
- Returns
pareto_ind – Pareto front indices.
- Return type
2d array
- mat_discover.utils.pareto.is_pareto_efficient_simple(costs)[source]
Find the pareto-efficient points.
- Parameters
costs – An (n_points, n_costs) array
- Returns
A (n_points, ) boolean array, indicating whether each point is Pareto efficient
Fairly fast for many datapoints, less fast for many costs, somewhat readable
Modified from: https://stackoverflow.com/a/40239615/13697228
- mat_discover.utils.pareto.pareto_plot(df, x='neigh_avg_targ', y='target', color='Peak height', x_unit=None, y_unit=None, color_unit=None, hover_data=['formula'], fpath='figures/pareto-front', reverse_x=True, parity_type='max-of-both', pareto_front=True, color_continuous_scale=None, color_discrete_map=None, xrange=None, use_plotly_offline: bool = True)[source]
Generate and save pareto plot for two variables.
- Parameters
df (DataFrame) – Contains relevant variables for pareto plot.
x (str, optional) – Name of df column to use for x-axis, by default “proxy”
y (str, optional) – Name of df column to use for y-axis, by default “target”
color (str, optional) – Name of df column to use for colors, by default “Peak height”
hover_data (list of str, optional) – Name(s) of df columns to display on hover, by default [“formula”], e.g., could also be [“structure”]
fpath (str, optional) – Filepath to which to save HTML and PNG. Specify as None if no saving is desired, by default “pareto-plot”
reverse_x (bool, optional) – Whether to reverse the x-axis (i.e. for maximize y and minimize x front)
parity_type (str, optional) – What kind of parity line to plot: “max-of-both”, “max-of-each”, or “none”
use_plotly_offline (bool) – Whether to use offline.plot(fig) instead of fig.show(). Set to False for Google Colab. By default, True.
mat_discover.utils.plotting module
Various plotting functions for cluster properties and UMAP visualization.
- mat_discover.utils.plotting.cluster_count_hist(labels, figure_dir='figures')[source]
Plot histogram of cluster counts, colored by cluster IDs.
- Parameters
labels (1d array) – Cluster IDs.
- Returns
fig – Handle to Matplotlib Figure.
- Return type
Figure
- mat_discover.utils.plotting.dens_scatter(x, y, pdf_sum, figure_dir='figures')[source]
Plot DensMAP densities at the x and y embedding coordinates.
- Parameters
x (1d array) – x-coordinates
y (1d array) – y-coordinates
pdf_sum (1d array) – probabilities evaluated at each of the x and y coordinate pairs.
- Returns
fig – Handle to Matplotlib Figure.
- Return type
Figure
See also
mat_discover_.mvn_prob_sum
used to obtain x, y, and pdf_sum
- mat_discover.utils.plotting.dens_targ_scatter(std_emb, target, x, y, pdf_sum, figure_dir='figures')[source]
Plot overlay of density scatter and target scatter plots.
- Parameters
std_emb (2d array) – UMAP embedding coordinates.
target (1d array) – Target properties corresponding to std_emb.
x (1d array) – x-coordinates
y (1d array) – y-coordinates
pdf_sum (1d array) – probabilities evaluated at each of the x and y coordinate pairs.
- Returns
fig – Handle to Matplotlib Figure.
- Return type
Figure
See also
dens_scatter
density scatter plot
targ_scatter
target scatter plot
- mat_discover.utils.plotting.group_cv_parity(ytrue, ypred, labels, figure_dir='figures')[source]
Leave-one-cluster-out cross-validation parity plot colored by labels.
- Parameters
ytrue (1d array) – True target values.
ypred (1d array) – Predicted target values.
labels (1d array) – Cluster IDs.
- Returns
fig – Handle to Matplotlib Figure.
- Return type
Figure
- mat_discover.utils.plotting.matplotlibify(fig, size=24, width_inches=3.5, height_inches=3.5, dpi=142)[source]
- mat_discover.utils.plotting.target_scatter(std_emb, target, figure_dir='figures', color_unit=None)[source]
Plot UMAP embedding locations colored by target values.
- Parameters
std_emb (2d array) – UMAP embedding coordinates.
target (1d array) – Target properties corresponding to std_emb.
- Returns
fig – Handle to Matplotlib Figure.
- Return type
Figure
- mat_discover.utils.plotting.umap_cluster_scatter(std_emb, labels, figure_dir='figures')[source]
Plot UMAP embeddings colored by cluster IDs.
- Parameters
std_emb (2d array) – UMAP embedded coordinates.
labels (1d array) – Cluster IDs associated with the UMAP coordinates.
- Returns
fig – Handle to Matplotlib Figure.
- Return type
Figure