TransWikia.com

How to measure augmented data quality

Data Science Asked by Mikhail_Sam on September 6, 2020

I work on NLP binary classification task (but actually the question can be applied to any ML task using augmentation) and use Augmentation technique for creating additional data.
I already have trained model and new small dataset (model was trained on another one!).
So I create additional dataset using augmentation.

I have several functions/libraries/approaches to augment data.
My question is how to understand are new data (augmented) good or not at stage of creating new data (i.e. without retraining the model)?

At the current moment I have next idea:

Get from my small dataset examples where model falls (False negative/false positive) ->
Augment data on them -> Feed to model -> Look at the scores.
If model still fails on them – data is good enough.

But at all I’m not sure if this is a reliable approach.

Are there are some acknowledged metrics for that?
Or maybe someone can suggest ways to do this correctly?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP