Data Science Asked on July 7, 2021
Reducing dimensionality via PCA before training is a common practice, but PCA cannot makes use of nonlinear relations between features.
I read about UMAP (e.g. https://adanayak.medium.com/dimensionality-reduction-using-uniform-manifold-approximation-and-projection-umap-4aa4cef43fed), a technique for reducing dimensionality that is able to make sense of nonlinear relations between features.
However, I only saw its use in data presentation and exploration.
Would it make sense to use UMAP as a form of feature engineering/dimensionality reduction when creating input for downstream model training?
Yes, it makes sense, and that is one of the advantages UMAP has over t-SNE. While t-SNE has no ability to operate on out-of-sample data, UMAP creates a map to the lower-dimension space that can be applied to out-of-sample data just like the PCA matrix would be applied to out-of-sample data.
(Certainly we could run everything through the t-SNE algorithm and then do the data split, but that is majorly cheating. What happens when we get new observations that didn’t exist when we built the model, like how Siri is supposed to be able to understand speech by people who have yet to be born when they can talk in a few years?)
Correct answer by Dave on July 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP