sklearn_extra.cluster.CLARA

class sklearn_extra.cluster.CLARA(n_clusters=8, metric='euclidean', init='build', max_iter=300, n_sampling=None, n_sampling_iter=5, random_state=None)[source]

CLARA clustering.

Read more in the User Guide. CLARA (Clustering for Large Applications) extends k-medoids approach for a large number of objects. This algorithm use a sampling approach.

Parameters:
n_clustersint, optional, default: 8

The number of clusters to form as well as the number of medoids to generate.

metricstring, or callable, optional, default: ‘euclidean’

What distance metric to use. See :func:metrics.pairwise_distances

max_iterint, optional, default300

Specify the maximum number of iterations when fitting PAM. It can be zero in which case only the initialization is computed.

n_samplingint or None, optional, defaultNone

Size of the sampled dataset at each iteration. sampling-size a trade-off between complexity and efficiency. If None, then sampling-size is set to min(sample_size, 40 + 2 * self.n_clusters) as suggested by the authors of the algorithm. must be smaller than sample_size.

n_sampling_iterint, optional, default5

Number of different samples that have to be done, or number of iterations.

random_stateint, RandomState instance or None, optional

Specify random state for the random number generator. Used to initialise medoids when init=’random’.

See also

KMedoids

CLARA is a variant of KMedoids that use sub-sampling scheme as such if the dataset is sufficiently small, KMedoids is preferable.

Notes

Contrary to KMedoids, CLARA is linear in N the sample size for both the spacial and time complexity. On the other hand, it scales quadratically with n_sampling.

Examples

>>> from sklearn_extra.cluster import CLARA
>>> import numpy as np
>>> from sklearn.datasets import make_blobs
>>> X, _ = make_blobs(centers=[[0,0],[1,1]], n_features=2,random_state=0)
>>> clara = CLARA(n_clusters=2, random_state=0).fit(X)
>>> clara.predict([[0,0], [4,4]])
array([0, 1])
>>> clara.inertia_
122.44919397611667
Attributes:
cluster_centers_array, shape = (n_clusters, n_features)

or None if metric == ‘precomputed’

Cluster centers, i.e. medoids (elements from the original dataset)

medoid_indices_array, shape = (n_clusters,)

The indices of the medoid rows in X

labels_array, shape = (n_samples,)

Labels of each point

inertia_float

Sum of distances of samples to their closest cluster center.

__init__(n_clusters=8, metric='euclidean', init='build', max_iter=300, n_sampling=None, n_sampling_iter=5, random_state=None)[source]

Examples using sklearn_extra.cluster.CLARA

A demo of K-Medoids vs CLARA clustering on the handwritten digits data

A demo of K-Medoids vs CLARA clustering on the handwritten digits data