Factory API Reference¶
This document contains the plotting methods that are embedded into scikit-learn objects by the factory functions clustering_factory()
and classifier_factory()
.
Important Note
If you want to use stand-alone functions and not bother with the factory functions, view the Functions API Reference instead.
Classifier Plots¶
-
scikitplot.
classifier_factory
(clf)¶ Takes a scikit-learn classifier instance and embeds scikit-plot instance methods in it.
Parameters: clf – Scikit-learn classifier instance Returns: The same scikit-learn classifier instance passed in clf with embedded scikit-plot instance methods. Raises: ValueError
– If clf does not contain the instance methods necessary for scikit-plot instance methods.
-
scikitplot.classifiers.
plot_learning_curve
(clf, X, y, title=u'Learning Curve', cv=None, train_sizes=None, n_jobs=1, ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Generates a plot of the train and test learning curves for a given classifier.
Parameters: - clf – Classifier instance that implements
fit
andpredict
methods. - X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
- y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification or regression; None for unsupervised learning.
- title (string, optional) – Title of the generated plot. Defaults to “Learning Curve”
- cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
- Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if
y
is binary or multiclass,StratifiedKFold
used. If the estimator is not a classifier or ify
is neither binary nor multiclass,KFold
is used. - train_sizes (iterable, optional) – Determines the training sizes used to plot the
learning curve. If None,
np.linspace(.1, 1.0, 5)
is used. - n_jobs (int, optional) – Number of jobs to run in parallel. Defaults to 1.
- ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> import scikitplot.plotters as skplt >>> rf = RandomForestClassifier() >>> skplt.plot_learning_curve(rf, X, y) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
- clf – Classifier instance that implements
-
scikitplot.classifiers.
plot_confusion_matrix
(clf, X, y, labels=None, title=None, normalize=False, do_cv=True, cv=None, shuffle=True, random_state=None, ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Generates the confusion matrix for a given classifier and dataset.
Parameters: - clf – Classifier instance that implements
fit
andpredict
methods. - X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
- y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
- labels (array-like, shape (n_classes), optional) – List of labels to
index the matrix. This may be used to reorder or select a subset of labels.
If none is given, those that appear at least once in
y
are used in sorted order. (new in v0.2.5) - title (string, optional) – Title of the generated plot. Defaults to “Confusion Matrix” if normalize is True. Else, defaults to “Normalized Confusion Matrix.
- normalize (bool, optional) – If True, normalizes the confusion matrix before plotting. Defaults to False.
- do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
- cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
- Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if
y
is binary or multiclass,StratifiedKFold
used. If the estimator is not a classifier or ify
is neither binary nor multiclass,KFold
is used. - shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
- random_state (int
RandomState
) – Pseudo-random number generator state used for random sampling. - ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> rf = classifier_factory(RandomForestClassifier()) >>> rf.plot_learning_curve(X, y, normalize=True) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
- clf – Classifier instance that implements
-
scikitplot.classifiers.
plot_roc_curve
(clf, X, y, title=u'ROC Curves', do_cv=True, cv=None, shuffle=True, random_state=None, curves=(u'micro', u'macro', u'each_class'), ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Generates the ROC curves for a given classifier and dataset.
Parameters: - clf – Classifier instance that implements “fit” and “predict_proba” methods.
- X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
- y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
- title (string, optional) – Title of the generated plot. Defaults to “ROC Curves”.
- do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
- cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
- Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if
y
is binary or multiclass,StratifiedKFold
used. If the estimator is not a classifier or ify
is neither binary nor multiclass,KFold
is used. - shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
- random_state (int
RandomState
) – Pseudo-random number generator state used for random sampling. - curves (array-like) – A listing of which curves should be plotted on the resulting plot. Defaults to (“micro”, “macro”, “each_class”) i.e. “micro” for micro-averaged curve, “macro” for macro-averaged curve
- ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> nb = classifier_factory(GaussianNB()) >>> nb.plot_roc_curve(X, y, random_state=1) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
-
scikitplot.classifiers.
plot_ks_statistic
(clf, X, y, title=u'KS Statistic Plot', do_cv=True, cv=None, shuffle=True, random_state=None, ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Generates the KS Statistic plot for a given classifier and dataset.
Parameters: - clf – Classifier instance that implements “fit” and “predict_proba” methods.
- X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
- y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
- title (string, optional) – Title of the generated plot. Defaults to “KS Statistic Plot”.
- do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
- cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
- Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if
y
is binary or multiclass,StratifiedKFold
used. If the estimator is not a classifier or ify
is neither binary nor multiclass,KFold
is used. - shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
- random_state (int
RandomState
) – Pseudo-random number generator state used for random sampling. - ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> lr = classifier_factory(LogisticRegression()) >>> lr.plot_ks_statistic(X, y, random_state=1) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
-
scikitplot.classifiers.
plot_precision_recall_curve
(clf, X, y, title=u'Precision-Recall Curve', do_cv=True, cv=None, shuffle=True, random_state=None, curves=(u'micro', u'each_class'), ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Generates the Precision-Recall curve for a given classifier and dataset.
Parameters: - clf – Classifier instance that implements “fit” and “predict_proba” methods.
- X (array-like, shape (n_samples, n_features)) – Training vector, where n_samples is the number of samples and n_features is the number of features.
- y (array-like, shape (n_samples) or (n_samples, n_features)) – Target relative to X for classification.
- title (string, optional) – Title of the generated plot. Defaults to “Precision-Recall Curve”.
- do_cv (bool, optional) – If True, the classifier is cross-validated on the dataset using the cross-validation strategy in cv to generate the confusion matrix. If False, the confusion matrix is generated without training or cross-validating the classifier. This assumes that the classifier has already been called with its fit method beforehand.
- cv (int, cross-validation generator, iterable, optional) –
Determines the cross-validation strategy to be used for splitting.
- Possible inputs for cv are:
- None, to use the default 3-fold cross-validation,
- integer, to specify the number of folds.
- An object to be used as a cross-validation generator.
- An iterable yielding train/test splits.
For integer/None inputs, if
y
is binary or multiclass,StratifiedKFold
used. If the estimator is not a classifier or ify
is neither binary nor multiclass,KFold
is used. - shuffle (bool, optional) – Used when do_cv is set to True. Determines whether to shuffle the training data before splitting using cross-validation. Default set to True.
- random_state (int
RandomState
) – Pseudo-random number generator state used for random sampling. - curves (array-like) – A listing of which curves should be plotted on the resulting plot. Defaults to (“micro”, “each_class”) i.e. “micro” for micro-averaged curve
- ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> nb = classifier_factory(GaussianNB()) >>> nb.plot_precision_recall_curve(X, y, random_state=1) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
-
scikitplot.classifiers.
plot_feature_importances
(clf, title=u'Feature Importance', feature_names=None, max_num_features=20, order=u'descending', ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Generates a plot of a classifier’s feature importances.
Parameters: - clf – Classifier instance that implements
fit
andpredict_proba
methods. The classifier must also have afeature_importances_
attribute. - title (string, optional) – Title of the generated plot. Defaults to “Feature importances”.
- feature_names (None,
list
of string, optional) – Determines the feature names used to plot the feature importances. If None, feature names will be numbered. - max_num_features (int) – Determines the maximum number of features to plot. Defaults to 20.
- order ('ascending', 'descending', or None, optional) – Determines the order in which the feature importances are plotted. Defaults to ‘descending’.
- ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> import scikitplot.plotters as skplt >>> rf = RandomForestClassifier() >>> rf.fit(X, y) >>> skplt.plot_feature_importances(rf, feature_names=['petal length', 'petal width', ... 'sepal length', 'sepal width']) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
- clf – Classifier instance that implements
Clustering Plots¶
-
scikitplot.
clustering_factory
(clf)¶ Takes a scikit-learn clusterer and embeds scikit-plot plotting methods in it.
Parameters: clf – Scikit-learn clusterer instance Returns: The same scikit-learn clusterer instance passed in clf with embedded scikit-plot instance methods. Raises: ValueError
– If clf does not contain the instance methods necessary for scikit-plot instance methods.
-
scikitplot.clustering.
plot_silhouette
(clf, X, title=u'Silhouette Analysis', metric=u'euclidean', copy=True, ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Plots silhouette analysis of clusters using fit_predict.
Parameters: - clf – Clusterer instance that implements
fit
andfit_predict
methods. - X (array-like, shape (n_samples, n_features)) – Data to cluster, where n_samples is the number of samples and n_features is the number of features.
- title (string, optional) – Title of the generated plot. Defaults to “Silhouette Analysis”
- metric (string or callable, optional) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.
- copy (boolean, optional) – Determines whether
fit
is used on clf or on a copy of clf. - ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> import scikitplot.plotters as skplt >>> kmeans = KMeans(n_clusters=4, random_state=1) >>> skplt.plot_silhouette(kmeans, X) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
- clf – Clusterer instance that implements
-
scikitplot.clustering.
plot_elbow_curve
(clf, X, title=u'Elbow Plot', cluster_ranges=None, ax=None, figsize=None, title_fontsize=u'large', text_fontsize=u'medium')¶ Plots elbow curve of different values of K for KMeans clustering.
Parameters: - clf – Clusterer instance that implements
fit
andfit_predict
methods and ascore
parameter. - X (array-like, shape (n_samples, n_features)) – Data to cluster, where n_samples is the number of samples and n_features is the number of features.
- title (string, optional) – Title of the generated plot. Defaults to “Elbow Plot”
- cluster_ranges (None or
list
of int, optional) – List of n_clusters for which to plot the explained variances. Defaults torange(1, 12, 2)
. - copy (boolean, optional) – Determines whether
fit
is used on clf or on a copy of clf. - ax (
matplotlib.axes.Axes
, optional) – The axes upon which to plot the learning curve. If None, the plot is drawn on a new set of axes. - figsize (2-tuple, optional) – Tuple denoting figure size of the plot e.g. (6, 6).
Defaults to
None
. - title_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “large”.
- text_fontsize (string or int, optional) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.
Returns: The axes on which the plot was drawn.
Return type: ax (
matplotlib.axes.Axes
)Example
>>> import scikitplot.plotters as skplt >>> kmeans = KMeans(random_state=1) >>> skplt.plot_elbow_curve(kmeans, cluster_ranges=range(1, 11)) <matplotlib.axes._subplots.AxesSubplot object at 0x7fe967d64490> >>> plt.show()
- clf – Clusterer instance that implements