Data Science Asked on May 13, 2021
I’ve used K means to cluster my data. Before using K means, I had used StandardScaler on my data to standardize the data. Now, I’m wondering how can I show the clusters of the original data. Scikit-learn gives the labels on the standardized data but I want to have the labels on the original data and show the clusters of the original data on the graph.
Option 1:
Keep and access the original data (e.g. by index) - recompute the means.
Option 2:
Apply the inverse transformation. StandardScaler is a linear transformation, so its reversible up to some loss of precision.
Answered by Has QUIT--Anony-Mousse on May 13, 2021
StandardScaler subtracts the mean from each variable and then divides it by the standard deviation. It's a common preprocessing step, certainly for k-means because this algorithm heavily depends on the scaling of the data.
If I understand correctly you want to visualize the original data and make use of the labels from k-means by doing so. You could either add the labels to the original data (assuming the order of the records did not change):
original_with_label = numpy.concatenate(original, labels, axis = 1)
Or you could transform the data back to its original scale:
transformed_back_to_original = scalar_fit.inverse_transform(transformed_data)
Answered by Pieter on May 13, 2021
I think this is a really good tutorial for you to consider.
Towards the end, the author shows you how to map the index back to the cluster IDs.
details = [(name,cluster) for name, cluster in zip(returns.index,idx)]
for detail in details:
print(detail)
Answered by ASH on May 13, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP