Why does my model fail to predict on the whole dataset?

Data Science Asked by Marco Ramos on June 10, 2021

So I have about 3000 images with 6 classes and this is what I did:

1 – split into training set and test set prior to anything with 20% test size

2 – performed data augmentation on the under represented classes in the training set and ended up with 2700 training and 640 test

3 – did feature extraction techniques (haralick, dominant color, avg color, hist, etc) on both sets

4 – did normalization of features using standard scaler (fit_transform on training and after just transform on test)

5 – did a gridsearch with 5 fold cv to find best params just in the training set and got 91% accuracy average

6 – used the best estimator to predict on the test set and got 94% accuracy

7 – pickled the model and scaler and then uploaded on a new file

8 – create a predict function with all the transformations and then feed it a random image from the data set, in theory this is not new data so it should give the same results yet it fails miserably every time

what am I doing wrong?
I don’t think its overfitting otherwise my test accuracy would fail
I presume it’s something to do with the scaler?

accuracy image classification prediction predictive modeling

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Joshua Engel on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Jon Church on Why fry rice before boiling?