Data Science Asked on February 27, 2021
I’m dealing with the modeling of small experimental data sets. As most experimental work does not generate thousands of samples, but rather a handful, I need to be inventive in how to deal with this small number of data sets (say 10-20). I’ve been building a nice framework to do just this, and at this point I am interested in generating error-bars with the predicted values.
In a rough outline, this is what the happens in the framework (e.g. when applying a multi-linear model):
So take for example the following multiple linear regression model:
$$
y = beta_0 + beta_1 x_1 + beta_2 x_2 tag{1}
$$
and I’m looking for an algebraic equation to calculate (numerically) the prediction interval (PI) for a new prediction $y_0$ (Confidence interval would be OK as well as it is related to the PI).
So far, my searches have only been able to provide me with answers which deal with the statistical nature of the data set ($x_i$‘s). These provide me with an error component:
$$
hat{V}_f=s^2cdotmathbf{x_0}cdotmathbf{(X^TX)^{-1}}cdotmathbf{x_0^T} + s^2 tag{2}
$$
which can be used to calculate the PI, via:
$$
y=y_0 pm t_{alpha/2,n-k}cdotsqrt{hat{V}_f} tag{3}
$$
In contrast to those examples, each of the model coefficients ($beta_0, beta_1$ and $beta_2$) in this case have an error-bar (extracted via bootstrapping from a distribution, with the distributions being numerical in nature not analytic, and the distributions are specific for each of the three coefficients).
Is there a way to incorporate the uncertainty of the $beta_i$‘s (c.q. the “error-bars”) in the calculation of the PI (and CI).
Note
I Know, one could create an ensemble of the various model instances with the $beta_i$ drawn from their respective distributions, and based on the distribution of obtained $y_0$ calculate the CI of the $y_0$, but this is not really computationally efficient and brings a lot of other issues which I would like to avoid.
One possible solution is Bayesian linear regression. Bayesian linear regression estimates a posterior distribution for each coefficient. From that posterior distribution, a credible interval can be calculated.
Answered by Brian Spiering on February 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP