chembee.plotting package

Submodules

chembee.plotting.applicability module

chembee.plotting.applicability.plot_applicability_domain(res: dict, interval: tuple, file_name)[source]

The plot_applicability_domain function plots the number of times each applicability value was observed in the results. This is useful for determining if there are any obvious outliers or other issues with the data.

Parameters

result:dict – Used to Store the results of the classifier.
file_name – Used to Specify the name of the file that will be saved.

Returns

A plot of the counts of each applicability value.

Doc-author

Julian M. Kleber

chembee.plotting.benchmarking module

chembee.plotting.calibration module

chembee.plotting.calibration.plot_calibration(fig: figure, clf_list: list, calibration_displays: dict, ax_calibration_curve, grid_spec, grid: tuple, colors, file_name: str = 'calibration', prefix: str = 'calibration/')[source]

chembee.plotting.compounds module

chembee.plotting.compounds.plot_compound_list(mols, file_nam)[source]

The plot_compound_list function takes a list of RDKit molecules and plots them in a grid. The function takes two arguments:

mols: A list of RDKit molecules to plot.
file_name: The name of the image file to save the plot as, including the extension (.png, .svg, etc.).

Parameters

mols – Used to specify the list of molecules to be drawn.
file_nam – Used to name the image file.

Returns

A plot of the molecules in a list.

Doc-author

Julian M. Kleber

chembee.plotting.distribution module

chembee.plotting.distribution.plot_data_set_distribution_by_col(frame, column, x_label: str, file_name: str)[source]

The plot_data_set_distribution_by_col function plots the distribution of a column in a dataframe. It takes as input: - frame: The dataframe to be plotted. - column: The name of the column to be plotted. - x_label (optional): A label for the x-axis’. - file_name (optional): A string containing a path and file name where you want your plot saved, default is ‘plot’

Parameters

frame – Used to specify the data frame that contains the column to be plotted.
column – Used to secify the column of the dataframe that will be used to create the plot.
x_label:str – Used to set the x-axis label.
file_name:str – Used to specify the name of the file to save.

Returns

A plot of the distribution of a column in a dataframe.

Doc-author

Julian M. Kleber

chembee.plotting.evaluation module

chembee.plotting.evaluation.init_collection_plot(metrics_json, metric_type='') → tuple[source]

The init_collection_plot function initializes the plot for a collection of algorithms. It takes as input a metrics_json file and returns an array of values that will be plotted.

Parameters

metrics_json – Used to Store the metrics of each algorithm.
metric_type="" – Used to Specify the type of metric that we are plotting.

Returns

The metrics_storage variable.

Doc-author

Trelent

chembee.plotting.evaluation.make_grouped_bar_chart(metric_values: list, metric_names: list, x_labels: list, std_vals: str, file_name: str, width=0.1, *args, **kwargs)[source]

chembee.plotting.evaluation.parse_metrics_output(metrics_collection: dict) → dict[source]

The parse_metrics_output function takes a dictionary of metrics and returns a dictionary of dictionaries. The outermost key is the algorithm name, and the value is another dictionary with keys ‘precision’, ‘recall’, and ‘fscore’. The values are lists containing precision, recall, and fscore for each class in order.

Parameters: metrics_collection:dict – Used to Store the metrics of each algorithm.
Returns: A dictionary of dictionaries.
Doc-author: Trelent

chembee.plotting.evaluation.plot_bar_chart(algs: list, metrics: list, file_name: str, prefix: str, y_label: str, yerr=None, *args) → None[source]

The plot_bar_chart function creates a bar chart of the given data.

Parameters

algs:list – Used to Specify the algorithms to be compared.
metrics:list – Used to Specify the metrics that are plotted.
file_name:str – Used to Save the plot in a file.
prefix:str – Used to Specify the prefix of the file name.
y_label:str – Used to Specify the label of the y-axis.
yerr=None – Used to Specify if the bars should have errorbars.

Returns

None.

Doc-author

Trelent

chembee.plotting.evaluation.plot_bar_chart_collection(metrics_json: dict, file_name: str, prefix: str, *args) → None[source]

The plot_bar_chart_collection function plots a collection of bar charts, one for each metric in the metrics_json. The metrics_json is expected to be a dictionary with keys corresponding to algorithm names and values being dictionaries of scalar values (one value per metric). The file name is expected to be the same for all plots.

Parameters

metrics_json:dict – Used to Store the values of all metrics for each algorithm.
file_name:str – Used to Specify the name of the file that is generated by this function.
prefix:str – Used to Distinguish between different types of plots.

Returns

The metrics_storage array.

Doc-author

Julian M. Kleber

chembee.plotting.evaluation.plot_bar_chart_collection_strat(scalar_metrics: dict, file_name: str, prefix: str, *args) → None[source]

The plot_bar_chart_collection function plots a collection of bar charts, one for each metric in the metrics_json. The metrics_json is expected to be a dictionary with keys corresponding to algorithm names and values being dictionaries of scalar values (one value per metric). The file name is expected to be the same for all plots.

Parameters

metrics_json:dict – Used to Store the values of all metrics for each algorithm.
file_name:str – Used to Specify the name of the file that is generated by this function.
prefix:str – Used to Distinguish between different types of plots.

Returns

The metrics_storage array.

Doc-author

Julian M. Kleber

chembee.plotting.evaluation.plot_collection(metrics_json: dict, file_name: str, prefix: str) → None[source]

The plot_collection function takes a metrics_json dictionary, and plots the scalar metrics in a bar chart, and the array metrics in an ROC curve. The plot is saved to a file named ‘file_name’.prefix’

Parameters

metrics_json:dict – Used to Pass the metrics data.
file_name:str – Used to Specify the name of the file that will be created.
prefix:str – Used to Specify the prefix for the output file names.

Returns

A dictionary containing the file names of the plots.

Doc-author

Trelent

chembee.plotting.evaluation.plot_collection_stratified(metrics_json: dict, file_name: str, prefix: str) → None[source]

The plot_collection_stratified function plots the metrics of a collection stratified by the number of documents in each class. The function takes as input a dictionary containing metrics for each class, and outputs plots to file.

Parameters

metrics_json:dict – Used to Pass the metrics json file.
file_name:str – Used to Specify the name of the file that will be created.
prefix:str – Used to Define the prefix of the output file.

Returns

Nothing.

Doc-author

Trelent

chembee.plotting.evaluation.plot_combined_bar_chart(scalar_metrics: dict, file_name: str)[source]

chembee.plotting.evaluation.plot_grouped_bar_chart(labels: list, data_names: list, data: ndarray, file_name: str, prefix: str, y_label: str = 'Score') → None[source]

The plot_grouped_bar_chart function creates a grouped bar chart. The function takes the following parameters: - labels: list of strings, each string is a label for one group of bars; - data_names: list of strings, each string is a name for one bar in the plot; - data: numpy array with shape (len(labels), len(data_names)), contains values to be plotted as bars; - file_name (optional): str, name under which the plot will be saved. If not specified it will not save anything;

prefix (optional): str, prefix that should be added to all files created by this function. If not specified no prefix will be added.

Parameters

labels:list – Used to Set the labels for each bar.
data_names:list – Used to Label the bars.
data:np.ndarray – Used to Pass the data to be plotted.
file_name:str – Used to Specify the name of the file to be saved.
prefix:str – Used to Identify the type of data that is being plotted.
y_label:str="Score" – Used to Set the y label of the plot.

Returns

A plot of the data.

Doc-author

Trelent

chembee.plotting.evaluation.plot_heat_map()[source]

chembee.plotting.evaluation.plot_heat_map_collection(metrics_json: dict, file_name: str, prefix: str) → None[source]

The plot_heat_map_collection function takes a dictionary of metrics and plots them in a heat map. The function takes three arguments: 1) metrics_json: A dictionary containing the metric values for each algorithm, metric pair. The keys are tuples of the form (algorithm name, metric name). The values are lists of floats representing the value taken by that particular algorithm on that particular metric for every run. 2) file_name: A string representing what to save the resulting plot as 3) prefix: A string which is prepended to all algorithms names when plotting

Parameters

metrics_json:dict – Used to pass the metrics data.
file_name:str – Used to specify the name of the file where the plot will be saved.
prefix:str – Used to specify the prefix of the file name.

Returns

A list of the heat map images.

Doc-author

Julian M. Kleber

chembee.plotting.evaluation.plot_roc_chart(fpr: list, tpr: list, roc_auc: list, prefix: str, file_name: str = 'roc_auc_curve') → None[source]

The plot_roc_chart function takes in a list of false positive rates, true positive rates, and the area under the curve. It then plots these values on a graph and saves it to file_name.png

Parameters

fpr:list – Used to Plot the false positive rate.
tpr:list – Used to Plot the true positive rate.
roc_auc:list – Used to Plot the area under the curve.
prefix:str – Used to Add a prefix to the file name.
file_name:str="roc_auc_curve" – Used to Specify the name of the file to be saved.

Returns

The area under the curve (auc).

Doc-author

Trelent

chembee.plotting.evaluation.plot_roc_chart_collection(metrics_json: dict, file_name: str, prefix: str) → None[source]

The plot_roc_chart_collection function plots the ROC curves for a collection of algorithms. The function takes as input a dictionary containing the metrics for each algorithm, and an output file name. It then plots all of the ROC curves on one plot, with each curve labeled by its corresponding algorithm.

Parameters

metrics_json:dict – Used to sass the metrics_json:dict from the main function to this function.
file_name:str – Used to specify the name of the file to be saved.
prefix:str – Used to Add a prefix to the name of the file.

Returns

The metrics_storage variable.

Doc-author

Julian M. Kleber

chembee.plotting.evaluation.plot_roc_chart_collection_strat(metrics_json: dict, file_name: str, prefix: str) → None[source]

The plot_roc_chart_collection_strat function plots the ROC curves for a collection of algorithms. It takes as input: - metrics_json, which is a dictionary containing the performance metrics for each algorithm in the collection. The keys are strings representing each algorithm and its value is also a dictionary containing all of its performance metrics. - file_name, which is just the name of the file to be saved as (without any extension). The plot will be saved with an .png extension by default.

Parameters

metrics_json:dict – Used to Pass the metrics_json:dict from the main function to the plot_roc_chart_collection function.
file_name:str – Used to Specify the name of the file to which we want to save our plot.
prefix:str – Used to Distinguish between different plots.

Returns

A plot of the roc curve for each algorithm.

Doc-author

Trelent

chembee.plotting.evaluation.plot_roc_chart_strat(avg_fpr: list, avg_tpr: list, avg_roc_auc: list, std_tpr: list, std_fpr: list, std_roc_auc: float, prefix: str, file_name: str = 'roc_auc_curve') → None[source]

The plot_roc_chart_strat function plots the ROC curve for a given model. It takes as input: - avg_fpr, which is an array of false positive rates calculated from multiple iterations of the model on different test sets. The length of this array should be equal to the number of iterations you ran your model on. - avg_tpr, which is an array containing true positive rates calculated from multiple iterations of the model on different test sets. The length should be equal to that of avg_fpr and again corresponds to how many times you ran your model. - std_tpr, which is an array containing standard deviations for each value. - std_fpr, which contains standard deviations for each value in - prefix: a string that will appear at the beginning (before) any file name generated by this function; it can simply be appended with ‘prefix’ when calling this function if desired but it’s not required as such - file name: a string that will serve as part (after) prefix + _rocauc plot title + ‘.png’

Parameters

avg_fpr:list – Used to Plot the average false positive rate (x-axis).
avg_tpr:list – Used to Plot the mean curve.
avg_roc_auc:list – Used to Store the average roc_auc score for each fold.
std_tpr:list – Used to Calculate the upper and lower bound of the area under curve.
std_fpr:list – Used to Plot the standard deviation of the roc curve.
std_roc_auc:float – Used to Calculate the standard deviation of the roc-auc.
prefix:str – Used to Add a prefix to the file name.
file_name:str="roc_auc_curve" – Used to Save the figure.

Returns

None.

Doc-author

Julian M. Kleber

chembee.plotting.feature_extraction module

chembee.plotting.feature_extraction.plot_feature_importances(result_json: dict, file_name: str, prefix: str, show_x_label=True)[source]

The plot_feature_importances function plots the feature importances of a random forest model. It takes three arguments: - forest_importances: The feature importances from the random forest model, as returned by sklearn’s .feature_importance() method. - std: The standard deviation of the feature importances, as returned by sklearn’s .std() method on a RandomForestRegressor or RandomForestClassifier object. This is used to plot error bars for each importance value. If no standard deviation is available (i.e., if you are using a DecisionTreeRegressor or DecisionTreeClassifier), set this to None (the default). - file_name: A string containing the name of your desired output file (e.g., “FeatureImportancePlot” will result in “FeatureImportancePlotRFModelNameHere”). - show_y_label: Boolean to indicate if there should be a label for the y axis.

Parameters

forest_importances – Used to pass the feature importances to the function.
std – Used to plot the standard deviation of the feature importances.
file_name:std – Used to pass the file name of the plot to be saved.
prefix:std – Used to specify the prefix for the file name.

Returns

The feature importances of the forest.

Doc-author

Julian M. Kleber

chembee.plotting.graphics module

chembee.plotting.graphics.make_comparison_plot(fig, sub, models, titles, X, y, prefix=None, file_name='SVC_benchmark', feature_names=['Feature 1', 'Feature 2'], response_method='predict')[source]

The make_comparison_plot function creates a plot comparing the decision boundaries of several models. It takes as input: - fig, a matplotlib figure object to be saved to file (figsize=(12,8)) - sub, an array of axes objects on which plotting will occur (nrows=2, ncols=3) - models: an array of sklearn classifiers for which decision boundaries will be plotted.

Note that these must have been fitted already with fit()! The function then creates plots comparing the decision boundary and prediction values for each model in turn. It returns nothing.

Parameters

fig – Used to pass the figure object to which we want to plot.
sub – Used to create a subplot grid.
models – Used to pass the classifiers that will be used in the plot.
titles – Used to set the title of each subplot.
X – Used to pass the data to be plotted.
y – Used to specify the response variable.
prefix=None – Used to make sure that the file_name is saved in the current working directory.
file_name="SVC_benchmark" – Used to create a file_name for the plot.
feature_names=["Feature1" – Used to label the axes of the plot.
"Feature2"] – Used to specify the name of the x and y axis.

Returns

A plot of the decision boundaries for a set of classifiers.

Doc-author

Julian M. Kleber

chembee.plotting.graphics.plot_comparison_1(clf, X, y, prefix=None, file_name='SVC_benchmark', feature_names=['Feature 1', 'Feature 2'], response_method='predict')[source]

The plot_comparison_1 function creates a plot of the decision boundary for a given classifier. It takes as input: - clf, which is an instance of sklearn’s SVC class with default parameters (except for C and gamma) - X, which is the feature matrix containing all training samples in rows and their features in columns.

The number of columns must be equal to two because this function plots each sample as a point on the xy-plane. This function assumes that there are only two features in X (i.e., it can only plot 2D decision boundaries). If you want to use more than two features, then you need to reduce your data set so that it contains only 2D points!

y, which is the vector containing all labels corresponding to each training sample from X
prefix=None: A string indicating what should be printed before file_name when saving files; if None or empty string then no prefix will be added
file_name: The name used when saving files; if None or empty string then no file will be saved but instead everything will simply get plotted on screen using plt.show()

Parameters

clf – Used to pass the classifier to be used.
X – Used to specify the features that are used to train the model.
y – Used to specify the class labels.
prefix=None – Used to specify a prefix for the file name.
file_name="SVC_benchmark" – Used to save the plot as a png file.
feature_names=["Feature1" – Used to set the labels on the x and y axis.
"Feature2"] – Used to set the name of the second feature.
response_method="predict" – Used to specify that the.

Returns

The decision boundary plot of the svc model.

Doc-author

Julian M. Kleber

chembee.plotting.graphics.plot_comparison_2()[source]

chembee.plotting.graphics.plot_comparison_3(models, titles, X, y, prefix=None, file_name='SVC_benchmark', feature_names=['Feature 1', 'Feature 2'], response_method='predict')[source]

The plot_comparison_3 function creates a 3x3 grid of plots, each comparing the performance of a different model. The models are passed in as an array and their titles as another array. The X and y data is also required to generate the plots. The function will automatically save the plot to file_name + .png in your current working directory unless otherwise specified using prefix = “”. If you would like to save multiple comparisons, simply call this function once with different sets of parameters.

Parameters

models – Used to pass the models that are to be compared.
titles – Used to set the title of each plot.
X – Used to pass the data.
y – Used to specify the target variable.
prefix=None – Used to pass a string to the plot_comparison_3 function.
file_name="SVC_benchmark" – Used to name the file that will be saved in your directory.
feature_names=["Feature1" – Used to specify the names of the features in x.
"Feature2"] – Used to Specify the name of the feature that will be plotted on the y-axis.
response_method="predict" – Used to specify the method used to generate the response.

Returns

A dictionary with the trained models.

Doc-author

Julian M. Kleber

chembee.plotting.graphics.plot_comparison_4(models, titles, X, y, prefix=None, file_name='SVC_benchmark', feature_names=['Feature 1', 'Feature 2'], response_method='predict')[source]

The plot_comparision_4 function plots the decision boundaries for a set of models. It takes as input: - models: A list of scikit learn model objects. Each model object must have a .predict method that takes in an array of features and returns predictions based on those features. The plot_comparision_4 function uses these predictions to create the decision boundary for each model, and then plots them together in a single figure. - titles: A list with one element for each element in ‘models’. Each string is used to label the corresponding plot’s legend, so make sure that they are unique! - X: An array containing feature values used to train/evaluate the models passed via ‘models’. These feature values should be scaled between 0 and 1 (if not already done). The first two columns should correspond to x coordinates while the second two columns should correspond to y coordinates (i.e., this is an Nx4 matrix). This argument corresponds directly with what you would pass into plt.scatter(…), so if you have already scaled your data, then just pass it here without re-scaling it! - y: An array containing labels corresponding with observations contained within X (i.e., this

Parameters

models – Used to Pass a list of models that should be compared.
titles – Used to Give a name to each model in the plot.
X – Used to Define the data that will be used for training and testing.
y – Used to Define the target variable.
prefix=None – Used to Define a string that will be added to the name of the file when it is saved.
file_name="SVC_benchmark" – Used to Save the plot as an image.
feature_names=["Feature1" – Used to Specify the labels for the x and y axis.
"Feature2"] – Used to Specify which feature is used for the x-axis.

Returns

The subplot of the models.

Doc-author

Trelent

chembee.plotting.graphics.save_fig(ax, name: str, save_path=None, dpi=300)[source]

Frequently used snippet to save a plot

Args:: ax ([type]): Seaborn axis name (str): Name of the plot save_path ([type], optional): Where to save the plot. Defaults to None:str. dpi (int, optional): Resolution of the plot. Defaults to 300.
Returns:: (None)

chembee.plotting.lipinski module

chembee.plotting.lipinski.polar_plot(data_set: DataSet, file_name: str, prefix: str)[source]

The polar_plot function creates a polar plot of the lipinski parameters. It takes in a dataframe and file name as input, and outputs a png file with the polar plot.

Parameters

data_set:pd.DataFrame – Used to Set the data that will be plotted.
file_name:str – Used to Create a directory for each molecule.
prefix:str – Used to Determine the name of the file.

Returns

The figure object.

Doc-author

Julian M. Kleber

chembee.plotting package

Submodules

chembee.plotting.applicability module

chembee.plotting.benchmarking module

chembee.plotting.calibration module

chembee.plotting.compounds module

chembee.plotting.distribution module

chembee.plotting.evaluation module

chembee.plotting.feature_extraction module

chembee.plotting.graphics module

chembee.plotting.lipinski module

Module contents