Cross Validated Asked by xeon123 on November 28, 2020
I am trying to understand the concept of the confidence interval, but I get confused with t-test, p-values, standard deviation, and quantiles. My problem is the following:
I created a model in machine learning that predicts a dependent variable. For each prediction, I calculate the Relative Prediction Error (prediction - true Value / true value
).
I want to calculate the confidence interval so that I could say, for example, between the interval [-1, 1] (let’s assume that the errors are normally distributed around the 0) is where 95% of the relative errors are. How can I do this?
Is it possible to have the distribution of the Relative Prediction Errors with positive or negative skewness? If so, will the intervals, where 95% of the relative errors are, be symmetrical or asymmetrical? (e.g., [-2, 1] or [-1, 2])?
It sounds like you are looking to produce an interval that captures some proportion of relative prediction errors, and NOT a confidence interval. For clarity, a confidence interval should be understood as a means to quantify uncertainty about the value of a parameter in a statistical model.
To provide an interval that captures $P$% of your relative prediction errors, you could simply use the ($frac{100 - P}{2}$)th and ($100 - frac{100 - P}{2}$)th sample percentiles of your relative prediction errors as estimates of the lower and upper boundary of the interval. For example, if you wanted to capture 90% of the errors, you would use the 5th and 95th percentiles.
I should note that the sample percentiles are estimates of the population percentiles, and so you could create confidence intervals around both bounds of your desired interval to further quantify your uncertainty. I should also note that my proposed method assumes that your relative prediction errors are independent and identically distributed.
There are other methods of estimation that could be used other than taking the sample quantiles directly e.g., fitting a model to your relative prediction errors and using the modeled distribution's percentiles. There are also other ways to construct an interval for instance: my proposal centers around the median, whereas other methods might find the interval with the highest density (referred to as a highest density interval or HDI).
With respect to your second question, there is no guarantee that the distribution of your relative prediction errors will be symmetric. Thus, you should be prepared to see asymmetrical intervals.
Answered by David Telson on November 28, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP