TransWikia.com

Clustering Weekday Weekend Data and Multicollinearity

Data Science Asked by TYL on October 27, 2020

Hi I have data of weekday and weekend step counts in which I extracted metrics from them such as the wd steps, we steps, standard deviation of wd steps, standard deviation of we steps and so on…

  wd_count  we_count  wd_sd_count  we_sd_count  ... .... ....
1  5000      3000      300          500
2  7000      2000      400          100

If I do clustering on this data, the weekday and weekend variables are going to be highly correlated and I will have to remove them before clustering. Is there any way around this problem for this kind of analysis?

One Answer

Yes its called correlation clustering.

Even though correlation can cause problems with many clustering algorithms by giving extra weight on these attributes, it would be best to drop highly correlated variables for example with PCA

However, there exist correlation clustering algorithms that are meant to process data containing multiple correlations, and cluster objects based on the correlations they exhibit, using your problem exactly to an advantage.

Answered by Noah Weber on October 27, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP