Intuition Behind K-Means

K-Mean finds similarity between the points and it is then able to group them into clusters. As you can see in the above figure, it represents two clusters, grouped based on similarity. It’s an unsupervised machine learning algorithm.

Why do we need K-Mean?

In unsupervised learning, we have data without labels. But to find different useful insights from the data, we need to know the pattern the data is forming. But as there is no label, one of the methods to find the pattern in data by grouping them into clusters based on similarities in the features is K-Means.

Intuition behind K-Means

Let’s take step by step look at how K-Means works!!

Assume, the above-mentioned graph displays the points scattered in my dataset.

STEP 1:

The significance of K in K-Means is that it denotes the number of clusters that forms in respective scattered points.

STEP 2:

Initialize two centroids randomly in the plane.

Suppose, K= 2, it means, 2 centroids will be selected randomly.

STEP 3:

Find out the centroid closest to a given point and then assign that point to the cluster belonging to that centroid.

L1 – the distance between centroid 1 and the marked point

L2 – the distance between centroid 2 and the marked point

In the above figure,

L1 < L2, which means that point is closer to Centroid 1, so point 1 will be assigned to a cluster of centroid 1.

Likewise, other points’ distance to centroid will calculate with the help of Euclidean distance and then the point will be assigned to the respective cluster.

Euclidean Distance for a 2D point is display by:

Euclidean Distance for a 3D point is given by:

The feature below mentions how Euclidien Distance works.

STEP 4:

Select the groups, find the mean value, and assign it as a new centroid to the cluster.

STEP 5:Reassign each data point to a newly formed centroid by calculating the distance to the nearest centroid and then assigning that point to the respective cluster.Now, the question is:To what condition do we have to repeat this process?This step happens till no point movement happens and the groups will be fixed. Until then we have to repeat steps 4 and 5.In short, this process will continue till we get the exact number of the cluster as we have set the K-value. But, how do we calculate the K-value?For this, we use the Elbow method.WCSS-> Within Cluster Sum of Square.    
WCSS =    

Using WCSS, we measure whether the randomly initialized centroid is correct or not. It tells us how close the centroid to the points is in the given cluster.Higher the value of WCSS-> the cluster formed is not properWhile applying to observe the curve obtained from the Elbow method, we will get a point after which slope decreases gradually and that is the suitable value for K.

This method is refer as the Elbow method because the graph which we form in this method forms a shape similar to that of an Elbow.

Random Initialization Trap

Let’s understand this concept with the help of an example.

Suppose, these are my points plotted on the graph.

Now, if our K=3, picture the cluster in your mind. How should it look?

Yes, visually we can distinguish the cluster from the above-grouped cluster.

But, has this thought come to your mind? What if? The randomly selected centroid is not the same as the centroid marked with yellow color in the above diagram.

This is how one case of random initialization could look like. Though 3 clusters formed, it will not give us proper insights to find the correct pattern in our data. To avoid this ‘kmeans++’ method is used the Elbow method.

For detailed insights on sklearn K-Means refer: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

Let’s now enjoy the application of K-Mean in real-life scenarios.

Problem Statement:

The dataset contains details of the Customer visiting mall. A short introduction to how our dataset looks like is mentioned in the picture below. 

Here, we need to find the pattern in the people visiting the mall, which group of people should be targeted and which should not be targeted.

Written By: Ketki Kinkar

Reviewed By: Savya Sachi

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *