Data Science Asked on June 16, 2021
I have pandas table which contains data about different observations, each one was measured in different wavlength. These observsations are different than each other in the treatment they have gotten.
The table looks something like this:
>>>name treatment 410.1 423.2 445.6 477.1 485.2 ....
0 A1 0 0.01 0.02 0.04 0.05 0.87
1 A2 1 0.04 0.05 0.05 0.06 0.04
2 A3 2 0.03 0.02 0.03 0.01 0.03
3 A4 0 0.02 0.02 0.04 0.05 0.91
4 A5 1 0.05 0.06 0.04 0.05 0.02
...
I would like to classify the different observations based on their spectrum (the numerical columns).
I have tried to run PCA and to paint it according to the treatment the observations got, and to compare it to the results of classifications like k-means and Spectral clustering, but i’m not sure that I choose the right methods because is seems all the time like the clusters are too much like euclidean distance and i’m not sure that they take into account the spectrum (I have used all the numerical columns for the prediction).
This is for exampel the comparison between the PCA+Colors compared to the Spectral cllasification:
PCA:
classification( the points located according to PCA1 PCA2 but the colores are according the the classification:
as you can see here, it seems like the classification is based on real distance and I would like something that take into account all the numerical values.
So, i’m looking for any insights regard other methods of classifications that could give me better results or maybe other ideas how I can check if there are clusters inside my data based on the measurments in different columns, like if I could predict the treatment from the clusters
This sounds like a normal supervised classification task.
Have you tried other standard methods like Support Vector Machines, RandomForests, Gradient Boosting, kNN, Neural Networks etc. as well or is there a particular reason why you only tried clustering methods.
Clustering methods like kmeans or spectral clustering are usually used in an unsupervised setting where class memberships are not available. Often they make certain assumptions about the data which might be violated, e.g. kmeans assumes spherical clusters, which is clearly not the case for your data.
Correct answer by Tinu on June 16, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP