TransWikia.com

What are practical differences between kernel k-means and spectral clustering?

Data Science Asked by Kuba_ on August 17, 2020

I’ve been lately wondering about kernel k-means and spectral clustering algorithms and their differences.

  • I know that spectral clustering is a more broad term and different settings can affect the way it works, but one popular variant is using K-means clustering on the spectral embedding of affinity matrix.

  • On the other hand kernel K-means applies K-means clustering directly to the affinity matrix. Therefore one immediate, theoretical difference is it omits spectral embedding step, i.e. it doesn’t look for the lower-dimensional representation of data with eigenvectors.

I think it could be beneficial in high-dimensional settings (with many observations to cluster), but does it provide any boost with small sample size, for example from 10 to 20 observations?

What are other practical implications of using either of these algorithms versus another (e.g. which one is more sensitive to any changes in affinity etc.)?

One Answer

The differences are indeed not too large. There is a paper called Kernel k-means, Spectral Clustering and Normalized Cuts by Inderjit S. Dhillon, Yuqiang Guan, Brian Kulis from KDD 2004 that is discussing that relationship. The authors show that spectral clustering of normalized cuts is a special case of weighted kernel k-means. The reason is that the "graph cut problem and weighted kernel k-means problem can both be written as trace maximization problems".

They write that

this has important implications: a) eigenvector-based algorithms, which can be computationally prohibitive, are not essential for minimizing normalized cuts, b) various techniques, such as local search and acceleration schemes, maybe used to improve the quality as well as speed of kernel k-means.

Further they combine ideas from both to improve the total results and

show that using eigenvectors to initialize kernel k-means gives better initial and final objective function values and better clustering results.

Regarding robustness in spectral clustering you might want to look at Robust Spectral Clustering for Noisy Data by Aleksandar Bojchevski, Yves Matkovic, Stephan Günnemann, published at the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2017: https://www.kdd.org/kdd2017/papers/view/robust-spectral-clustering-for-noisy-data


Inderjit S. Dhillon, Yuqiang Guan, Brian Kulis, Kernel k-means: spectral clustering and normalized cuts, KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 2004, Pages 551–556, https://doi.org/10.1145/1014052.1014118

Answered by Make42 on August 17, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP