K-means clustering is a method for finding clusters and cluster centers in a set of unlabelled data. Intuitively, we might think of a cluster as %u2013 comprising of a group of data points, whose inter-point distances are small compared with the distances to points outside of the cluster. Given an initial set of K centers, the K-means algorithm iterates the following two steps:-

1. For each center, the subset of training points (its cluster) that is closer to it is identified as any other center.

2. The mean of each feature for the data points in each cluster is computed, and this mean vector becomes the new center for that cluster.

The steps are as follows, suppose we have an input x1,x2, x3,....xn, data, and value K.

Step - 1:

Select K random points as a cluster center called the centroid. Suppose these are c1,c2,...ck, and it can be written as follows:

" c1,c2,...ck "

C is the set of all centroid.

Step-2:

Assign each input value xi to the nearest center by calculating its Euclidean (L2) distance between the point and each centroid.

Step-3:

In this step, we get the new centroid by calculating the average of all the points assigned to the cluster.

Step-4:

We repeat steps 2 and 3 until none of the clusters remains unstable.

Import K-Means

We will see the implementation and usage of each imported function.

from SciPy.cluster.vq import kmeans,vq,whiten

# whitening of data data = whiten(data)

## Comments