Mathematica Asked on September 2, 2021
Say I have a 4-dimensional data set with two independent variables x1, x2
, and two dependent variables y1, y2
, i.e. each row is {x1, x2, y1, y2}
:
data = RandomReal[{0, 1}, {100, 4}];
How do I feed it to LinearModelFit
to fit
$$ begin{pmatrix}
y_1
y_2
end{pmatrix} = begin{pmatrix}
a_{11} & a_{12}
a_{21} & a_{22}
end{pmatrix}
begin{pmatrix}
x_1
x_2
end{pmatrix} +
begin{pmatrix}
c_1
c_2
end{pmatrix} $$
?
LinearModelFit
doesn't do multivariate regression as far as I'm aware. You can use my repository function BayesianLinearRegression instead. The first example in the "Scope" section shows you how. You can provide the data in the format
data[[All, {1, 2}]] -> data[[All, {3, 4}]]
or
#[[{1, 2}]] -> #[[{3, 4}]]& /@ data
For example:
fitData = ResourceFunction["BayesianLinearRegression"][
data[[All, {1, 2}]] -> data[[All, {3, 4}]],
{1, x1, x2}, (* basis functions *)
{x1, x2} (* independent variables *)
];
You can find the best-fit expression with:
Mean[fitData["Posterior", "PredictiveDistribution"]]
The output should provide you with all the details about the uncertainties of the predictions and the regression coefficients. My blog post has more background information, if you need it.
Correct answer by Sjoerd Smit on September 2, 2021
You can build the regression yourself as an NMinimize
of residuals which are squared distances to points.
First let's build some synthetic noisy data:
(* create some noisy data that follows a linear model *)
n = 1000;
datax = RandomReal[{-1, 1}, {n, 2}];
testmtx = {{3, 4}, {1/2, 1/6}};
testoffset = {3/2, 5/7};
fn[{x1_, x2_}] := testmtx.{x1, x2} + testoffset
noise = RandomVariate[NormalDistribution[0, 1/10], {n, 2}];
datay = (fn /@ datax) + noise;
(* this is the noisy 4d data *)
data = MapThread[Join, {datax, datay}];
ListPlot[{datax, datay}, PlotRange -> {{-4, 4}, {-4, 4}},
AspectRatio -> 1, PlotStyle -> PointSize[Small]]
The ideal fit is:
$$ left( begin{array}{cc} y_1 y_2 end{array} right)= left( begin{array}{cc} 3 & 4 1/2 & 1/6 end{array} right) left( begin{array}{cc} x_1 x_2 end{array} right) + left( begin{array}{cc} 3/2 5/7 end{array} right) $$
... but pretend we don't know that and we only work with data
from this point. Here's what the $x_1,x_2$ values (blue) vs the noisy $y_1,y_2$ values (orange) look like:
Then construct a residual function and an objective which is to minimize the total residuals:
matrix = {{a1, a2}, {a3, a4}};
offset = {c1, c2};
sqresidual[{x1_, x2_, y1_, y2_}, mtx_, c_] :=
SquaredEuclideanDistance[c + mtx.{x1, x2}, {y1, y2}]
objective = Total[sqresidual[#, matrix, offset] & /@ data];
... and finally use NMinimize
:
NMinimize[objective, {a1, a2, a3, a4, c1, c2}]
(* result: {19.8142, {a1 -> 2.99722, a2 -> 4.00609, a3 -> 0.498218,
a4 -> 0.165467, c1 -> 1.49577, c2 -> 0.7118}} *)
The result is pretty close to ideal!
Answered by flinty on September 2, 2021
For simple applications of regression, there is no need to fit multiple dependent variables simultaneously. The results are the same as a regression of each dependent variable separately. There are caveats if you are doing further analysis, but if you just want the basic regression results, you can do separate fits.
Answered by nanoman on September 2, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP