TransWikia.com

Clustering a variable based on another variable or set of variables

Data Science Asked by Chinti on March 11, 2021

df11[['COMPONENT_ID','FIRMWARE','SERIAL','CRP0_VDDN']].head()

enter image description here

Consider I have these four columns to analyse. I want to form say 3-5 clusters of COMPONENT_IDs with similar characters. I want this to happen based on the remaining features or just CRPO_VDNN in relation with COMPONENT_IDs. How can I do this ?

One Answer

First of all, clustering is used only for numeric values, especially for continuous values. What you are trying to do here is to cluster a categorical variable, that too an ID column. I'm unsure about the goal but this is not a good technique to cluster values.

That being said, I'm not sure how many unique entries you have in the ID column. You have to convert it into categorical numbers before clustering. Then take only the ID column and the CRPO_VDNN column and use KNN to cluster. You can set the number of clusters in the KNN too.

Answered by Senthamizhan on March 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP