TransWikia.com

How to incorporate the uncertainty of the model coefficients in the prediction interval of a multiple linear regression

Data Science Asked on February 27, 2021

I’m dealing with the modeling of small experimental data sets. As most experimental work does not generate thousands of samples, but rather a handful, I need to be inventive in how to deal with this small number of data sets (say 10-20). I’ve been building a nice framework to do just this, and at this point I am interested in generating error-bars with the predicted values.

In a rough outline, this is what the happens in the framework (e.g. when applying a multi-linear model):

  1. Create an ensemble of N data sets.
  2. On each data set a regression gives rise to a (linear) model as given in Eq.1 below. This gives rise to N values for each of the coefficients $beta$.
  3. The mean is calculated for each of the three sets of $beta$‘s. (The mean could also be another function, but for now assume it’s the mean)
  4. These three mean $beta$‘s are the coefficients of the model to be used (again Eq.1).
  5. The goal: find the prediction interval (PI) for the model in Eq.1 taking into account the fact that the coefficients $beta$ are calculated from numerical distributions.

So take for example the following multiple linear regression model:
$$
y = beta_0 + beta_1 x_1 + beta_2 x_2 tag{1}
$$

and I’m looking for an algebraic equation to calculate (numerically) the prediction interval (PI) for a new prediction $y_0$ (Confidence interval would be OK as well as it is related to the PI).

So far, my searches have only been able to provide me with answers which deal with the statistical nature of the data set ($x_i$‘s). These provide me with an error component:
$$
hat{V}_f=s^2cdotmathbf{x_0}cdotmathbf{(X^TX)^{-1}}cdotmathbf{x_0^T} + s^2 tag{2}
$$

which can be used to calculate the PI, via:
$$
y=y_0 pm t_{alpha/2,n-k}cdotsqrt{hat{V}_f} tag{3}
$$

In contrast to those examples, each of the model coefficients ($beta_0, beta_1$ and $beta_2$) in this case have an error-bar (extracted via bootstrapping from a distribution, with the distributions being numerical in nature not analytic, and the distributions are specific for each of the three coefficients).
Is there a way to incorporate the uncertainty of the $beta_i$‘s (c.q. the “error-bars”) in the calculation of the PI (and CI).


Note
I Know, one could create an ensemble of the various model instances with the $beta_i$ drawn from their respective distributions, and based on the distribution of obtained $y_0$ calculate the CI of the $y_0$, but this is not really computationally efficient and brings a lot of other issues which I would like to avoid.

One Answer

One possible solution is Bayesian linear regression. Bayesian linear regression estimates a posterior distribution for each coefficient. From that posterior distribution, a credible interval can be calculated.

Answered by Brian Spiering on February 27, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP