2. Clustering with KMedoids, CLARA and Common-nearest-neighbors¶
KMedoids is related to the
KMeans algorithm. While
KMeans tries to minimize the within cluster sum-of-squares,
KMedoids tries to minimize the sum of distances between each point and
the medoid of its cluster. The medoid is a data point (unlike the centroid)
which has the least total distance to the other members of its cluster. The use of
a data point to represent each cluster’s center allows the use of any distance
metric for clustering. It may also be a practical advantage, for instance K-Medoids
algorithms have been used for facial recognition for which the medoid is a
typical photo of the person to recognize while K-Means would have obtained a blurry
image that mixed several pictures of the person to recognize.
KMedoids can be more robust to noise and outliers than
as it will choose one of the cluster members as the medoid while
KMeans will move the center of the cluster towards the outlier which
might in turn move other points away from the cluster centre.
KMedoids is also different from K-Medians, which is analogous to
except that the Manhattan Median is used for each cluster center instead of
the centroid. K-Medians is robust to outliers, but it is limited to the
Manhattan Distance metric and, similar to
KMeans, it does not guarantee
that the center of each cluster will be a member of the original dataset.
The complexity of K-Medoids is \(O(N^2 K T)\) where \(N\) is the number
of samples, \(T\) is the number of iterations and \(K\) is the number of
clusters. This makes it more suitable for smaller datasets in comparison to
KMeans which is \(O(N K T)\).
sphx_glr_auto_examples_plot_kmedoids_digits.py: Applying K-Medoids on digits with various distance metrics.
There are several algorithms to compute K-Medoids, though
currently only supports K-Medoids solver analogous to K-Means called alternate
and the algorithm PAM (partitioning around medoids). Alternate algorithm is used
when speed is an issue.
Alternate method works as follows:
n_clustersfrom the dataset as the medoids using a heuristic, random, or k-medoids++ approach (configurable using the
Assignment step: assign each element from the dataset to the closest medoid.
Update step: Identify the new medoid of each cluster.
Repeat the assignment and update step while the medoids keep changing or maximum number of iterations
PAM method works as follows:
Initialize: Greedy initialization of
n_clusters. First select the point in the dataset that minimizes the sum of distances to a point. Then, add one point that minimizes the cost and loop until
n_clusterspoints are selected. This is the
Swap Step: for all medoids already selected, compute the cost of swapping this medoid with any non-medoid point. Then, make the swap that decreases the cost the most. Loop and stop when there is no change anymore.
Maranzana, F.E., 1963. On the location of supply points to minimize transportation costs. IBM Systems Journal, 2(2), pp.129-135.
Park, H.S. and Jun, C.H., 2009. A simple and fast algorithm for K-medoids clustering. Expert systems with applications, 36(2), pp.3336-3341.
Kaufman, L. and Rousseeuw, P.J. (2008). Partitioning Around Medoids (Program PAM). In Finding Groups in Data (eds L. Kaufman and P.J. Rousseeuw). doi:10.1002/9780470316801.ch2
Bhat, Aruna (2014).K-medoids clustering using partitioning around medoids for performing face recognition. International Journal of Soft Computing, Mathematics and Control, 3(3), pp 1-12.
Algorithm description: CLARA uses random samples of the dataset, each of size sampling_size The algorith is iterative, first we select one sub-sample, then CLARA applies KMedoids on this sub-sample to obtain n_clusters medoids. At the next step, CLARA sample sampling_size-n_clusters from the dataset and the next sub-sample is composed of the best medoids found until now (with respect to inertia in the whole dataset, not the inertia only on the sub-sample) to which we add the new samples just drawn. Then, K-Medoids is applied to this new sub-sample, and loop back until sample sub-samples have been used.
2.3. Common-nearest-neighbors clustering¶
provides an interface to density-based
common-nearest-neighbors clustering. Density-based clustering identifies
clusters as dense regions of high point density, separated by sparse
regions of lower density. Common-nearest-neighbors clustering
approximates local density as the number of shared (common) neighbors
between two points with respect to a neighbor search radius. A density
threshold (density criterion) is used – defined by the cluster
min_samples (number of common neighbors) and
radius) – to distinguish high from low density. A high value of
min_samples and a low value of
eps corresponds to high density.
As such the method is related to other density-based cluster algorithms
DBSCAN or Jarvis-Patrick. DBSCAN
approximates local density as the number of points in the neighborhood
of a single point. The Jarvis-Patrick algorithm uses the number of
common neighbors shared by two points among the \(k\) nearest neighbors.
As these approaches each provide a different notion of how density is
estimated from point samples, they can be used complementarily. Their
relative suitability for a classification problem depends on the nature
of the clustered data. Common-nearest-neighbors clustering (as
density-based clustering in general) has the following advantages over
other clustering techniques:
The cluster result is deterministic. The same set of cluster parameters always leads to the same classification for a data set. A different ordering of the data set leads to a different ordering of the cluster assignment, but does not change the assignment qualitatively.
Little prior knowledge about the data is required, e.g. the number of resulting clusters does not need to be known beforehand (although cluster parameters need to be tuned to obtain a desired result).
Identified clusters are not restricted in their shape or size.
Points can be considered noise (outliers) if they do not fullfil the density criterion.
The common-nearest-neighbors algorithm tests the density criterion for
pairs of neighbors (do they have at least
min_samples points in the
intersection of their neighborhoods at a radius
eps). Two points that
fullfil this criterion are directly part of the same dense data region,
i.e. they are density reachable. A density connected network of
density reachable points (a connected component if density reachability
is viewed as a graph structure) constitutes a separated dense region and
therefore a cluster. Note, that for example in contrast to
DBSCAN there is no differentiation in
core (dense points) and edge points (points that are not dense
themselves but neighbors of dense points). The assignment of points on
the cluster rims to a cluster is possible, but can be ambiguous. The
cluster result is returned as a 1D container of labels, i.e. a sequence
of integers (zero-based) of length \(n\) for a data set of \(n\)
denoting the assignment of points to a specific cluster. Noise is
-1. Valid clusters have at least two members. The
clusters are not sorted by cluster member count. In same cases the
algorithm tends to identify small clusters that can be filtered out
The present implementation of the common-nearest-neighbors algorithm in
commonalities with the current
scikit-learn implementation of
It computes neighborhoods from points in bulk with
the actual clustering. Consequently, to store the neighborhoods
it requires memory on the order of
\(O(n ⋅ n_n)\) for \(n\) points in the data set where \(n_n\)
average number of neighbors (which is proportional to
eps), that is at
worst \(O(n^2)\). Depending on the input structure (dense or sparse
points or similarity matrix) the additional memory demand varies.
The clustering itself follows a
breadth-first-search scheme, checking the density criterion at every
node expansion. The linear time complexity is roughly proportional to
the number of data points \(n\), the total number of neighbors \(N\)
and the value of
min_samples. For density-based clustering
schemes with lower memory demand, also consider:
B. Keller, X. Daura, W. F. van Gunsteren “Comparing Geometric and Kinetic Cluster Algorithms for Molecular Simulation Data” J. Chem. Phys., 2010, 132, 074110.
O. Lemke, B.G. Keller “Density-based Cluster Algorithms for the Identification of Core Sets” J. Chem. Phys., 2016, 145, 164104.
O. Lemke, B.G. Keller “Common nearest neighbor clustering - a benchmark” Algorithms, 2018, 11, 19.