Data Science Asked on February 21, 2021
One topic I see some people trying is using GANs to generate synthetic tabular data for supervised learning. Also as a way to oversample the minority class in a binary classification.
For me creating synthetic data is a bit dangerous.
In practice, all the experiments that I have seen to generate new training data using GANs have failed.
Is there any theoretical reason behind?
GANs have many known problems. The main ones are:
GANs for image generation have been studied extensively. Other domains, like speech filtering, have also been studied, but not so extensively. In other domains, like text generation, GANs are not very successful. For tabular data generation via GANs, the amount of released work is scarce: medGAN, VeeGAN, ehrGAN, TableGAN, CTGAN.
I think that one of the main problems preventing us from devising better GANs in non-image domains is the evaluation. With images, you can eyeball the results and quickly determine if they are of good quality and diverse. However, with other domains, it is not easy to evaluate both the quality and diversity of the generated data.
I think most people nowadays stick to classical oversampling methods to generate tabular data.
Answered by noe on February 21, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP