K-Medoids Clustering Algorithm

K-medoids is a clustering which is a variant of a k-means algorithm for portioning data in a set of k groups or clusters. In K-medoids instead of using centroids of clusters, actual data points are use to represent clusters. K-medoids or Partitioning Around Medoid (PAM) method was propose by Kaufman and Rousseeuw, so as a better alternative to K-means algorithm.

however, K-medoid is a strong alternative to k-means clustering because the algorithm is less sensitive to noise and outliers, compared to its variant k-means, because it uses medoids as cluster centres instead of means of the cluster objects (used in the k-means algorithm).

Working of K-Medoids

In this method, before calculating the distance of a data object to a clustering medoid (centroid), k clustering medoids (centroids) randomly selects from n data objects such that initial partition is complete on the basis of the closeness of each object to the clustering medoids (centroid) to begin the partitioning of data. Then, the iteration method is applies continuously till the most appropriate partition value is obtain.

In this method, the object from each clustering samples is thus chosen based on the improvement of clustering quality after every iteration. however, The most centrally located object in a cluster is taken as a reference point, which is actually a medoid of the elements present in the cluster and not a mean value of elements in a cluster like in the k-means method.

though The basic principle of the K-medoids method is to minimize the total sum of the distance of dissimilar points from a reference point for partitioning. By empirically taking  representative data from each cluster, total k clusters are clasp in such a way that each of the remaining data points is cluster with medoid.

PAM

In PAM Clustering Algorithm, the word PAM initials stand for “partition around medoids”. The algorithm is intended purpose is to find a sequence of objects called medoids that are centrally located objects in clusters. These medoids are placed into a set S of selected objects. If O is the set of objects then the set of unselected objects is given by U=O−S.

thus, The goal of the PAM algorithm is to minimize the average dissimilarity of objects present in the cluster to their closest selected object which helps us to minimize the sum of the dissimilarities between object and their closest selected object of the cluster.

K-means vs K-medoids

                  K-Means                      K-Medoids
Attempts to minimize the total squared error.             d Minimizes the sum of dissimilarities between points labelled to be in a cluster and their closest selected object.
      Takes means of elements in a dataset      Takes medoids of elements in a dataset
      Time complexity is O(ndk+1)      Time complexity is O(k*(n-k)2)
      More sensitive to outliners      Less sensitive to outliners
      Uses Euclidean Distance      Uses Partitioning Around Medoids (PAM)

Advantages

  1. K-Medoid algorithm is simple and easy to implement
  2. K-Medoids Algorithm is fast and converges in a fixed number of steps.
  3. PAM is less sensitive to outliners than other partitioning algorithms

Disadvantages

  1. The main disadvantages of K-Medoids algorithm are that it is not suitable for clustering a non-spherical group of objects, because it relies on minimizing the distances between the non-medoids objects and the medoids briefly. It uses compactness as clustering criteria instead of connectivity between the non-medoids and medoids.
  2. It may obtain different results for different runs on the same dataset because the first K medoids are chosen randomly.

Applications:-

1. Academic Performance

Based on the scores obtained by the student, they are classified into grades like A, B, C, D etc.

2. Diagnostic System 

In the medical profession, it helps in creating smarter medical decision support systems, especially in the treatment of liver ailments. 

3. Document Classification 

Cluster documents in multiple categories based on their tags, topics, and the content of the document. 

4. Customer Segmentation 

It helps marketers to improve their customer base, work on target areas and segment customer based on purchase history, interest. The classification would help the company target specific clusters of customers for specific campaigns and sell their products according to their interest. 

Summary

K-medoids is a robust alternative to the K-Means algorithm. In the k-medoids method, each cluster is represented by a selected object within the cluster and the selected objects are named medoids which corresponds to the most centrally located points within the cluster. K-Medoid is more efficient as it is less sensitive to outliners which makes it better than the K-Means method and it is applicable in various fields and in future, its usage can rise.

Written By: Prateek Sharma

Reviewed By: Savya Sachi

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *