Multivariate linear regression

Question

Say I have a 4-dimensional data set with two independent variables x1, x2, and two dependent variables y1, y2, i.e. each row is {x1, x2, y1, y2}:
data = RandomReal[{0, 1}, {100, 4}];

How do I feed it to LinearModelFit to fit
$$ begin{pmatrix}
y_1 
y_2
end{pmatrix} = begin{pmatrix}
a_{11} & a_{12} 
a_{21} & a_{22} 
end{pmatrix}
begin{pmatrix}
x_1 
x_2 
end{pmatrix} + 
begin{pmatrix}
c_1 
c_2
end{pmatrix} $$
?

Sjoerd Smit · Accepted Answer

LinearModelFit doesn't do multivariate regression as far as I'm aware. You can use my repository function BayesianLinearRegression instead. The first example in the "Scope" section shows you how. You can provide the data in the format
data[[All, {1, 2}]] -> data[[All, {3, 4}]]

or
#[[{1, 2}]] -> #[[{3, 4}]]& /@ data

For example:
fitData = ResourceFunction["BayesianLinearRegression"][
  data[[All, {1, 2}]] -> data[[All, {3, 4}]],
  {1, x1, x2}, (* basis functions *)
  {x1, x2} (* independent variables *)
];

You can find the best-fit expression with:
Mean[fitData["Posterior", "PredictiveDistribution"]]

The output should provide you with all the details about the uncertainties of the predictions and the regression coefficients. My blog post has more background information, if you need it.

flinty · Answer

You can build the regression yourself as an NMinimize of residuals which are squared distances to points.
First let's build some synthetic noisy data:
(* create some noisy data that follows a linear model *)
n = 1000;
datax = RandomReal[{-1, 1}, {n, 2}];
testmtx = {{3, 4}, {1/2, 1/6}};
testoffset = {3/2, 5/7};
fn[{x1_, x2_}] := testmtx.{x1, x2} + testoffset
noise = RandomVariate[NormalDistribution[0, 1/10], {n, 2}];
datay = (fn /@ datax) + noise;

(* this is the noisy 4d data *)
data = MapThread[Join, {datax, datay}];

ListPlot[{datax, datay}, PlotRange -> {{-4, 4}, {-4, 4}}, 
 AspectRatio -> 1, PlotStyle -> PointSize[Small]]

The ideal fit is:
$$
left(
begin{array}{cc}
y_1 
y_2
end{array}
right)=
left(
begin{array}{cc}
 3 & 4 
 1/2 & 1/6 
end{array}
right)
left(
begin{array}{cc}
x_1 
x_2
end{array}
right)
+ 
left(
begin{array}{cc}
3/2 
5/7
end{array}
right)
$$
... but pretend we don't know that and we only work with data from this point. Here's what the $x_1,x_2$ values (blue) vs the noisy $y_1,y_2$ values (orange) look like:

Then construct a residual function and an objective which is to minimize the total residuals:
matrix = {{a1, a2}, {a3, a4}};
offset = {c1, c2};
sqresidual[{x1_, x2_, y1_, y2_}, mtx_, c_] := 
 SquaredEuclideanDistance[c + mtx.{x1, x2}, {y1, y2}]
objective = Total[sqresidual[#, matrix, offset] & /@ data];

... and finally use NMinimize:
NMinimize[objective, {a1, a2, a3, a4, c1, c2}]
(* result: {19.8142, {a1 -> 2.99722, a2 -> 4.00609, a3 -> 0.498218, 
  a4 -> 0.165467, c1 -> 1.49577, c2 -> 0.7118}} *)

The result is pretty close to ideal!

nanoman · Answer

For simple applications of regression, there is no need to fit multiple dependent variables simultaneously. The results are the same as a regression of each dependent variable separately. There are caveats if you are doing further analysis, but if you just want the basic regression results, you can do separate fits.

Answered by nanoman on September 2, 2021

Multivariate linear regression

3 Answers

Add your own answers!

Ask a Question