Data Science Asked on January 16, 2021
For my problem, every image which is not a picture of a question on a paper (either from text book or handwritten), is a spam. It means that each and every image in this world is a spam for my case except the pictures of textbook/handwritten questions. I have used ResNet50
with 47000 questions
and 43000 spams
images. For the spam questions, I have used Coco 2017 test
set with 42K images. My model gave me an awesome val and train metrices prec,rec,acc, f1 of all greater than 0.99. But on the new images, it performed so poorly that it just gave me 0.6 F-1 score.
What should I do make a generalised model apart from using data augmentation and more data. And if I have to use more data, how much would be sufficient? I resized my images to 224,224. Do I need to use large images? Data collection is not a problem as I have millions of images and for the spams, I can use the Google Public
dataset of more than 70 Millions images. But it’ll cost me a huge computational power and load.
What are the other methods that I can try out?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP