Data Science Asked by Marco Ramos on June 10, 2021
So I have about 3000 images with 6 classes and this is what I did:
1 – split into training set and test set prior to anything with 20% test size
2 – performed data augmentation on the under represented classes in the training set and ended up with 2700 training and 640 test
3 – did feature extraction techniques (haralick, dominant color, avg color, hist, etc) on both sets
4 – did normalization of features using standard scaler (fit_transform on training and after just transform on test)
5 – did a gridsearch with 5 fold cv to find best params just in the training set and got 91% accuracy average
6 – used the best estimator to predict on the test set and got 94% accuracy
7 – pickled the model and scaler and then uploaded on a new file
8 – create a predict function with all the transformations and then feed it a random image from the data set, in theory this is not new data so it should give the same results yet it fails miserably every time
what am I doing wrong?
I don’t think its overfitting otherwise my test accuracy would fail
I presume it’s something to do with the scaler?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP