Factory API Reference¶

This document contains the plotting methods that are embedded into scikit-learn objects by the factory functions clustering_factory() and classifier_factory().

Important Note

If you want to use stand-alone functions and not bother with the factory functions, view the Functions API Reference instead.

Classifier Plots¶

scikitplot.classifier_factory(clf)¶

DEPRECATED: This will be removed in v0.4.0. The Factory API has been deprecated. Please migrate existing code into the various new modules of the Functions API. Please note that the interface of those functions will likely be different from that of the Factory API.

Embeds scikit-plot instance methods in an sklearn classifier.

Args:

clf: Scikit-learn classifier instance

Returns:

The same scikit-learn classifier instance passed in clf with embedded scikit-plot instance methods.

Raises:

ValueError: If clf does not contain the instance methods

necessary for scikit-plot instance methods.

scikitplot.classifiers.plot_learning_curve(clf, X, y, title='Learning Curve', cv=None, train_sizes=None, n_jobs=1, scoring=None, ax=None, figsize=None, title_fontsize='large', text_fontsize='medium')¶

DEPRECATED: This will be removed in v0.4.0. Please use scikitplot.estimators.plot_learning_curve instead.

Generates a plot of the train and test learning curves for a classifier.

Args:

clf: Classifier instance that implements fit and predict

methods.

X (array-like, shape (n_samples, n_features)):

Training vector, where n_samples is the number of samples and n_features is the number of features.

y (array-like, shape (n_samples) or (n_samples, n_features)):

Target relative to X for classification or regression; None for unsupervised learning.

title (string, optional): Title of the generated plot. Defaults to

“Learning Curve”

cv (int, cross-validation generator, iterable, optional): Determines

the cross-validation strategy to be used for splitting.

Possible inputs for cv are:

None, to use the default 3-fold cross-validation,

integer, to specify the number of folds.

An object to be used as a cross-validation generator.

An iterable yielding train/test splits.

For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is not a classifier or if y is neither binary nor multiclass, KFold is used.

train_sizes (iterable, optional): Determines the training sizes used to

plot the learning curve. If None, np.linspace(.1, 1.0, 5) is used.

n_jobs (int, optional): Number of jobs to run in parallel. Defaults to

scoring (string, callable or None, optional): default: None

A string (see scikit-learn model evaluation documentation) or a scorerbcallable object / function with signature scorer(estimator, X, y).

ax (matplotlib.axes.Axes, optional): The axes upon which to

plot the curve. If None, the plot is drawn on a new set of axes.

figsize (2-tuple, optional): Tuple denoting figure size of the plot

e.g. (6, 6). Defaults to None.

title_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.

text_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

ax (matplotlib.axes.Axes): The axes on which the plot was

drawn.

Example:
>>> import scikitplot.plotters as skplt
>>> rf = RandomForestClassifier()
>>> skplt.plot_learning_curve(rf, X, y)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()

scikitplot.classifiers.plot_confusion_matrix_with_cv(clf, X, y, labels=None, true_labels=None, pred_labels=None, title=None, normalize=False, hide_zeros=False, x_tick_rotation=0, do_cv=True, cv=None, shuffle=True, random_state=None, ax=None, figsize=None, cmap='Blues', title_fontsize='large', text_fontsize='medium')¶

Generates the confusion matrix for a given classifier and dataset.

Parameters:

clf – Classifier instance that implements fit and predict methods.
X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
labels (array-like, shape (n_classes), optional) – List of labels to index the matrix. This may be used to reorder or select a subset of labels. If none is given, those that appear at least once in y are used in sorted order. (new in v0.2.5)
true_labels (array-like, optional) – The true labels to display. If none is given, then all of the labels are used.
pred_labels (array-like, optional) – The predicted labels to display. If none is given, then all of the labels are used.
title (string, optional) – Title of the generated plot. Defaults to “Confusion Matrix” if normalize` is True. Else, defaults to “Normalized Confusion Matrix.
normalize (bool, optional) – If True, normalizes the confusion matrix before plotting. Defaults to False.
hide_zeros (bool, optional) – If True, does not plot cells containing a value of zero. Defaults to False.
x_tick_rotation (int, optional) – Rotates x-axis tick labels by the specified angle. This is useful in cases where there are numerous categories and the labels overlap each other.
do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is not a classifier or if y is neither binary nor multiclass, KFold is used.
shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
random_state (int RandomState) – Pseudo-random number generator state used for random sampling.
ax (matplotlib.axes.Axes, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes.
figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6). Defaults to None.
cmap (string or matplotlib.colors.Colormap instance, optional) – Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options. https://matplotlib.org/users/colormaps.html
title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

The axes on which the plot was: drawn.

Return type:

ax (matplotlib.axes.Axes)

Example

>>> rf = classifier_factory(RandomForestClassifier())
>>> rf.plot_confusion_matrix(X, y, normalize=True)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()

scikitplot.classifiers.plot_roc_curve_with_cv(clf, X, y, title='ROC Curves', do_cv=True, cv=None, shuffle=True, random_state=None, curves=('micro', 'macro', 'each_class'), ax=None, figsize=None, cmap='nipy_spectral', title_fontsize='large', text_fontsize='medium')¶

Generates the ROC curves for a given classifier and dataset.

Parameters:

clf – Classifier instance that implements fit and predict methods.
X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
title (string, optional) – Title of the generated plot. Defaults to “ROC Curves”.
do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is not a classifier or if y is neither binary nor multiclass, KFold is used.
shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
random_state (int RandomState) – Pseudo-random number generator state used for random sampling.
curves (array-like) – A listing of which curves should be plotted on the resulting plot. Defaults to (“micro”, “macro”, “each_class”) i.e. “micro” for micro-averaged curve, “macro” for macro-averaged curve
ax (matplotlib.axes.Axes, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes.
figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6). Defaults to None.
cmap (string or matplotlib.colors.Colormap instance, optional) – Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options. https://matplotlib.org/users/colormaps.html
title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

The axes on which the plot was: drawn.

Return type:

ax (matplotlib.axes.Axes)

Example

>>> nb = classifier_factory(GaussianNB())
>>> nb.plot_roc_curve(X, y, random_state=1)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()

scikitplot.classifiers.plot_ks_statistic_with_cv(clf, X, y, title='KS Statistic Plot', do_cv=True, cv=None, shuffle=True, random_state=None, ax=None, figsize=None, title_fontsize='large', text_fontsize='medium')¶

Generates the KS Statistic plot for a given classifier and dataset.

Parameters:

clf – Classifier instance that implements “fit” and “predict_proba” methods.
X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
title (string, optional) – Title of the generated plot. Defaults to “KS Statistic Plot”.
do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is not a classifier or if y is neither binary nor multiclass, KFold is used.
shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
random_state (int RandomState) – Pseudo-random number generator state used for random sampling.
ax (matplotlib.axes.Axes, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes.
figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6). Defaults to None.
title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

The axes on which the plot was: drawn.

Return type:

ax (matplotlib.axes.Axes)

Example

>>> lr = classifier_factory(LogisticRegression())
>>> lr.plot_ks_statistic(X, y, random_state=1)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()

scikitplot.classifiers.plot_precision_recall_curve_with_cv(clf, X, y, title='Precision-Recall Curve', do_cv=True, cv=None, shuffle=True, random_state=None, curves=('micro', 'each_class'), ax=None, figsize=None, cmap='nipy_spectral', title_fontsize='large', text_fontsize='medium')¶

Generates the Precision-Recall curve for a given classifier and dataset.

Parameters:

clf – Classifier instance that implements “fit” and “predict_proba” methods.
X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
title (string, optional) – Title of the generated plot. Defaults to “Precision-Recall Curve”.
do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if y is binary or multiclass, StratifiedKFold used. If the estimator is not a classifier or if y is neither binary nor multiclass, KFold is used.
shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
random_state (int RandomState) – Pseudo-random number generator state used for random sampling.
curves (array-like) – A listing of which curves should be plotted on the resulting plot. Defaults to (“micro”, “each_class”) i.e. “micro” for micro-averaged curve
ax (matplotlib.axes.Axes, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes.
figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6). Defaults to None.
cmap (string or matplotlib.colors.Colormap instance, optional) – Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options. https://matplotlib.org/users/colormaps.html
title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

The axes on which the plot was: drawn.

Return type:

ax (matplotlib.axes.Axes)

Example

>>> nb = classifier_factory(GaussianNB())
>>> nb.plot_precision_recall_curve(X, y, random_state=1)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()

scikitplot.classifiers.plot_feature_importances(clf, title='Feature Importance', feature_names=None, max_num_features=20, order='descending', x_tick_rotation=0, ax=None, figsize=None, title_fontsize='large', text_fontsize='medium')¶

DEPRECATED: This will be removed in v0.4.0. Please use scikitplot.estimators.plot_feature_importances instead.

Generates a plot of a classifier’s feature importances.

Args:

clf: Classifier instance that implements fit and predict_proba

methods. The classifier must also have a feature_importances_ attribute.

title (string, optional): Title of the generated plot. Defaults to

“Feature importances”.

feature_names (None, list of string, optional): Determines the

feature names used to plot the feature importances. If None, feature names will be numbered.

max_num_features (int): Determines the maximum number of features to

plot. Defaults to 20.

order (‘ascending’, ‘descending’, or None, optional): Determines the

order in which the feature importances are plotted. Defaults to ‘descending’.

x_tick_rotation (int, optional): Rotates x-axis tick labels by the

specified angle. This is useful in cases where there are numerous categories and the labels overlap each other.

ax (matplotlib.axes.Axes, optional): The axes upon which to

plot the curve. If None, the plot is drawn on a new set of axes.

figsize (2-tuple, optional): Tuple denoting figure size of the plot

e.g. (6, 6). Defaults to None.

title_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.

text_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

ax (matplotlib.axes.Axes): The axes on which the plot was

drawn.

Example:
>>> import scikitplot.plotters as skplt
>>> rf = RandomForestClassifier()
>>> rf.fit(X, y)
>>> skplt.plot_feature_importances(
...     rf, feature_names=['petal length', 'petal width',
...                        'sepal length', 'sepal width'])
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()

Clustering Plots¶

scikitplot.clustering_factory(clf)¶

DEPRECATED: This will be removed in v0.4.0. The Factory API has been deprecated. Please migrate existing code into the various new modules of the Functions API. Please note that the interface of those functions will likely be different from that of the Factory API.

Embeds scikit-plot plotting methods in an sklearn clusterer instance.

Args:

clf: Scikit-learn clusterer instance

Returns:

The same scikit-learn clusterer instance passed in clf with embedded scikit-plot instance methods.

Raises:

ValueError: If clf does not contain the instance methods necessary

for scikit-plot instance methods.

scikitplot.clustering.plot_silhouette(clf, X, title='Silhouette Analysis', metric='euclidean', copy=True, ax=None, figsize=None, cmap='nipy_spectral', title_fontsize='large', text_fontsize='medium')¶

DEPRECATED: This will be removed in v0.4.0. Please use scikitplot.metrics.plot_silhouette instead.

Plots silhouette analysis of clusters using fit_predict.

Args:

clf: Clusterer instance that implements fit and fit_predict

methods.

X (array-like, shape (n_samples, n_features)):

Data to cluster, where n_samples is the number of samples and n_features is the number of features.

title (string, optional): Title of the generated plot. Defaults to

“Silhouette Analysis”

metric (string or callable, optional): The metric to use when

calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.

copy (boolean, optional): Determines whether fit is used on

clf or on a copy of clf.

ax (matplotlib.axes.Axes, optional): The axes upon which to

plot the curve. If None, the plot is drawn on a new set of axes.

figsize (2-tuple, optional): Tuple denoting figure size of the plot

e.g. (6, 6). Defaults to None.

cmap (string or matplotlib.colors.Colormap instance, optional):

Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options. https://matplotlib.org/users/colormaps.html

title_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.

text_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

ax (matplotlib.axes.Axes): The axes on which the plot was

drawn.

Example:
>>> import scikitplot.plotters as skplt
>>> kmeans = KMeans(n_clusters=4, random_state=1)
>>> skplt.plot_silhouette(kmeans, X)
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()

scikitplot.clustering.plot_elbow_curve(clf, X, title='Elbow Plot', cluster_ranges=None, ax=None, figsize=None, title_fontsize='large', text_fontsize='medium')¶

DEPRECATED: This will be removed in v0.4.0. Please use scikitplot.cluster.plot_elbow_curve instead.

Plots elbow curve of different values of K for KMeans clustering.

Args:

clf: Clusterer instance that implements fit and fit_predict

methods and a score parameter.

X (array-like, shape (n_samples, n_features)):

Data to cluster, where n_samples is the number of samples and n_features is the number of features.

title (string, optional): Title of the generated plot. Defaults to

“Elbow Plot”

cluster_ranges (None or list of int, optional): List of

n_clusters for which to plot the explained variances. Defaults to range(1, 12, 2).

copy (boolean, optional): Determines whether fit is used on

clf or on a copy of clf.

ax (matplotlib.axes.Axes, optional): The axes upon which to

plot the curve. If None, the plot is drawn on a new set of axes.

figsize (2-tuple, optional): Tuple denoting figure size of the plot

e.g. (6, 6). Defaults to None.

title_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.

text_fontsize (string or int, optional): Matplotlib-style fontsizes.

Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

Returns:

ax (matplotlib.axes.Axes): The axes on which the plot was

drawn.

Example:
>>> import scikitplot.plotters as skplt
>>> kmeans = KMeans(random_state=1)
>>> skplt.plot_elbow_curve(kmeans, cluster_ranges=range(1, 11))
<matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490>
>>> plt.show()