Data Science Asked on September 25, 2021
Stats newbie here. I have a small dataset of 646 samples that I’ve trained a reasonably performant model on (~99% test and val accuracy). To complicate things a little bit, the classes are somewhat unbalanced. It’s a binary classification problem.
Here is my confusion matrix on training data
[[387 1]
[ 1 73]]
on testing data:
[[74 1]
[ 0 10]]
on validation data:
[[85 1]
[ 0 13]]
My thoughts are that testing and validation have a very low specificity while training has a comparatively high specificity. However, given that only one sample is missed in both the testing and validation datasets, what is my real-world specificity? Is there a better generalizability measure? Is there something akin to a p-value that relates the reliability of the specificity given the size of the negative sample class?
Thanks!
Real world data is "test dataset", right? Data has to be divided in such a way that train-validation see part of data more than once while test data will be seen only once. In that sense, if the model is robust enough, it will perform well even on the test dataset. The assumption is that test data is as close as possible to real-world data.
Answered by Chaitanya Bapat on September 25, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP