Data Science Asked by dtrinh on December 20, 2020
Disclaimer: I’m a machine learning beginner.
I’m working on visualizing high dimensional data (text as tdidf vectors) into the 2D-space. My goal is to label/modify those data points and recomputing their positions after the modification and updating the 2D-plot. The logic already works, but each iterative visualization is very different from the previous one even though only 1 out of 28.000 features in 1 data point changed.
Some details about the project:
Here are 2 images to illustrate the problem:
I have tried several dimensionality reduction algorithms including MDS, PCA, tsne, UMAP, LSI and Autoencoder. The best results regarding computing time and visual representation I got with UMAP, so I sticked with it for the most part.
Skimming some research papers I found this one with a similar problem (small change in high dimension resulting in big change in 2D):
https://ieeexplore.ieee.org/document/7539329
In summary, they use t-sne to initialize each iterative step with the result of the first step.
First: How would I go about achieving this in actual code? Is this related to tsne’s random_state
?
Second: Is it possible to apply that strategy to other algorithms like UMAP? tsne takes way longer and wouldn’t really fit into the interactive use case.
Or is there some better solution I haven’t thought of for this problem?
You can initialize a UMAP embedding with a custom set of initial positions, so potentially you can initialise step 2 with the embedding from step 1 (with random positions for the new points).
Answered by Leland McInnes on December 20, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP