Forecasting: Multiple Linear Regression (OLS) outperforms Random Forests / Gradient Boosting / AdaBoost

Question

I'm using different forecasting methods on a dataset to try and compare the accuracy of these methods.
For some reason, multiple linear regression (OLS) is outperforming RF, GB and AdaBoost when comparing MAE, RMSE R^2 and MAPE. This is very surprising to me.
Is there any general reason that could explain this outperformance?
I know that ML methods don't perform well with datasets that have a small amount of samples, but this should not be the case here.
I'm a beginner in this area, so I hope this is not a stupid question and somebody is able to help me!
Thanks!

Akylas Stratigakos · Answer

First, it is impossible to say without further information about the nature of your data, the training conducted, etc. That being said, in general there is no guarantee that a more complex would outperform a simpler model in time series forecasting. In fact, this was a controversy in the earlier M-forecasting competitions, where simpler Exponential Smoothing methods outperformed the more complex ARIMA and Neural Networks (in recent years though machine learning methods clearly reign supreme). In all cases, you should compare performance against very simple benchmarks such as the $naive/ persistence$ method.
Regarding the evaluation:

$R^2$: This is used as an in-sample performance measure. I would argue that if more complex models such Gradient Boosting Machines achieve lower in-sample $R^2$ than a simpler linear model, this could indicate underfitting and you should increase the capacity of your model (increase number of trees, tune learning rate, etc.)
$MAE, RMSE, MAPE$: These are all used as out-of-sample performance metrics. If the more complex models outperform the simple linear model in-sample, but fail to generalize in a hold-out set, this could mean that your model is overfitting and you should include some form of regularization such as weight penalty or early stopping.

Forecasting: Multiple Linear Regression (OLS) outperforms Random Forests / Gradient Boosting / AdaBoost

One Answer

Add your own answers!

Ask a Question