`sklearn_extra.cluster`.KMedoids¶

class sklearn_extra.cluster.KMedoids(n_clusters=8, metric='euclidean', method='alternate', init='heuristic', max_iter=300, random_state=None)[source]¶

k-medoids clustering.

See also

KMeans: The KMeans algorithm minimizes the within-cluster sum-of-squares criterion. It scales well to large number of samples.

Notes

Since all pairwise distances are calculated and stored in memory for the duration of fit, the space complexity is O(n_samples ** 2).

References

Maranzana, F.E., 1963. On the location of supply points to minimize: transportation costs. IBM Systems Journal, 2(2), pp.129-135.
Park, H.S.and Jun, C.H., 2009. A simple and fast algorithm for K-medoids: clustering. Expert systems with applications, 36(2), pp.3336-3341.

Examples

>>> from sklearn_extra.cluster import KMedoids
>>> import numpy as np

>>> X = np.asarray([[1, 2], [1, 4], [1, 0],
...                 [4, 2], [4, 4], [4, 0]])
>>> kmedoids = KMedoids(n_clusters=2, random_state=0).fit(X)
>>> kmedoids.labels_
array([0, 0, 0, 1, 1, 1])
>>> kmedoids.predict([[0,0], [4,4]])
array([0, 1])
>>> kmedoids.cluster_centers_
array([[1., 2.],
       [4., 2.]])
>>> kmedoids.inertia_
8.0

See scikit-learn-extra/examples/plot_kmedoids_digits.py for examples of KMedoids with various distance metrics.

Attributes:

cluster_centers_array, shape = (n_clusters, n_features): or None if metric == ‘precomputed’

Cluster centers, i.e. medoids (elements from the original dataset)
medoid_indices_array, shape = (n_clusters,): The indices of the medoid rows in X
labels_array, shape = (n_samples,): Labels of each point
inertia_float: Sum of distances of samples to their closest cluster center.