Mean Shift

In this blog we will see another unsupervised clustering algorithm called Mean Shift.

Mean shift builds upon the concept of kernel density estimation (KDE). KDE is a method to estimate the underlying distribution (also called the probability density function) for a set of data. It works by placing a kernel on each point in the data set. A kernel is a fancy mathematical word for a weighting function.

There are many different types of kernels, but the most popular one is the Gaussian kernel. Adding all of the individual kernels up generates a probability surface (e.g., density function). Depending on the kernel bandwidth parameter used, the resultant density function will vary.

The first step when applying mean shift (and all clustering algorithms) is representing your data in a mathematical manner. For mean shift, this means representing your data as points, such as the set below

 KDE surface for our points above using a Gaussian kernel with a kernel bandwidth of 2. The first image is a surface plot, and the second image is a contour plot of the surface.

Mean shift exploits this KDE idea by imagining what the points would do if they all climbed up hill to the nearest peak on the KDE surface. It does so by iteratively shifting each point uphill until it reaches a peak.

If we take small kernel bandwidth then we have many peaks and many clusters can be formed, large kernel bandwidth will have only one peak and all points will climb up this resulting in one cluster. Kernels in between these two bandwidths result in good clustering

The top animation results in three KDE surface peaks, and thus three clusters

The second animation uses a smaller kernel bandwidth, and results in more than three clusters.

Advantages of Mean Shift

  • Mean Shift is relatively simple as compared to K means
  • Entire end result is controlled by one parameter—the kernel bandwidth value whereas in K means requires number of clusters to be specified as input.
  • Mean shift cleverly exploits the density of the points in an attempt to generate a reasonable number of clusters

Disadvantages of Mean Shift

  • It is slow. For problems with many points, it can take a long time to execute
  • Mean Shift is that it is computationally expensive — O(n²).

Applications

Used in Image Processing, computer visions, search engine, academic ranking, medicines.

Leave a Reply