Data Science Asked on October 2, 2021
so, let’s say I have a set of 3D points. Let’s say these points lie more or less on a plane that is embedded in the 3d space, then I can use PCA to ‘compress’ these 3D points to 2D coordinates on that plane, such that they still aproximate the original data well.
let’s say half of the 3d points don’t lie close to that plane, but instead close to some other plane.
If I just do PCA and reduce to 2 dimensions, I won’t get a good aproximation.
If the algorithm however would ‘see’ that some of the 3d points compress well onto one plane, and others compress well on another plane and label each point and do PCA separately for each set (and compress them to points with 2 coordinates plus one bit that says which set it belongs to) it will aproximate the original data much better.
What’s the name for such a PCA algorithm that is also capable of splitting the input data into maximally N sets (probably with some penalty on the number of sets), such that for each set dimensionality reduction yields a much better fitting than if all data points would be reduced together?
// Edit:
adding an example. If one would only cluster by distance in the high-dim space one would arrive at the bad clustering where there are more clusters and each cluster would have a higher error when projected down.
the good example uses fewer clusters and they project better on their 2 dimensional sub-spaces (the green cluster being able to even compress to a 1D space)
Your task is achieved by Subspace Clustering
Answered by Graph4Me Consultant on October 2, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP