Data Science Asked on June 9, 2021
Generally we calculate specific metrics for ML models on a test set (and we try to make that test set representative). I’m not clear on how to make inference about the same metrics for the population that the test set is representing – i.e., say I want to answer: if the model were to run on the whole population, what’s the confidence interval of metric in question at (e.g.) 95% significance level?
Now for a simple case I can try to use my basic stats knowledge: suppose I have a binary classification model and I’m interested in reporting its precision.
Or I could just calculate the interval as (assuming test set size $m$)$$hat ppm t_{m,95%}.sqrt{frac{hat p(1-hat p)}{m}}$$
where $t_{m,95%}$ is the t-distribution value corresponding to 95% significance level and sample size $m$.
But what about other metrics like precision-recall combo, mean absolute percentage error, mean absolute error, RMSE, etc. etc.? Obviously I’m not expecting a recipe for each metric, but just a general idea on how we go about getting interval estimates for arbitrary metrics. Also, does the methodology described above seem correct?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP