Does it make sense to use UMAP for dimensionality reduction for modeling (rather then presentation/exploration)?

Question

Reducing dimensionality via PCA before training is a common practice, but PCA cannot makes use of nonlinear relations between features.
I read about UMAP (e.g. https://adanayak.medium.com/dimensionality-reduction-using-uniform-manifold-approximation-and-projection-umap-4aa4cef43fed), a technique for reducing dimensionality that is able to make sense of nonlinear relations between features.
However, I only saw its use in data presentation and exploration.
Would it make sense to use UMAP as a form of feature engineering/dimensionality reduction when creating input for downstream model training?

Dave · Accepted Answer

Yes, it makes sense, and that is one of the advantages UMAP has over t-SNE. While t-SNE has no ability to operate on out-of-sample data, UMAP creates a map to the lower-dimension space that can be applied to out-of-sample data just like the PCA matrix would be applied to out-of-sample data.
(Certainly we could run everything through the t-SNE algorithm and then do the data split, but that is majorly cheating. What happens when we get new observations that didn’t exist when we built the model, like how Siri is supposed to be able to understand speech by people who have yet to be born when they can talk in a few years?)

Does it make sense to use UMAP for dimensionality reduction for modeling (rather then presentation/exploration)?

One Answer

Add your own answers!

Ask a Question