Data Science Asked by TYL on October 27, 2020
Hi I have data of weekday and weekend step counts in which I extracted metrics from them such as the wd steps, we steps, standard deviation of wd steps, standard deviation of we steps and so on…
wd_count we_count wd_sd_count we_sd_count ... .... ....
1 5000 3000 300 500
2 7000 2000 400 100
If I do clustering on this data, the weekday and weekend variables are going to be highly correlated and I will have to remove them before clustering. Is there any way around this problem for this kind of analysis?
Yes its called correlation clustering.
Even though correlation can cause problems with many clustering algorithms by giving extra weight on these attributes, it would be best to drop highly correlated variables for example with PCA
However, there exist correlation clustering algorithms that are meant to process data containing multiple correlations, and cluster objects based on the correlations they exhibit, using your problem exactly to an advantage.
Answered by Noah Weber on October 27, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP