Data Science Asked by Basti on January 13, 2021
How would you most likely create a large production ready image training dataset from scratch including annotations for a image classification task?
We will take a large amount of images (~1 million) with industrial cameras and save them in a S3 bucket. Do you think a data lake infrastructure is necessary?
In your opinion, what are the most suitable methods for annotating the images in the shortest possible time (bounding boxes not needed).
Solutions that I have been able to find so far are the following:
Are there any options I missed? In principle, it would be possible to pay for the annotation, but should be avoided or kept as small as possible.
Are there things that should be considered architecturally with such a large database?
I'm not expert in images classification, I'm just going to give some general advice here.
The strategy should be progressive, for instance:
It's only around this stage that the final annotation process can be fully designed. Depending on the strategy, you could consider devising a strategy for iterative manual annotation: some classes are going to be quickly learned by the model, so it could make sense to use the model in order to propose for annotation at the next round the images for which the model fails. Be careful to avoid bias, keep evaluating and refining the model at every round of annotations.
Answered by Erwan on January 13, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP