TransWikia.com

Random Forest overfitting with `n_estimators=1`, `max_depth=1 and `max_features =1`

Data Science Asked by frantic oreo on August 16, 2021

I am trying to stop my RF from overfitting. I am using time series data with 1 day time lag, to predict the current price. I am using this function to shift my independent features back 1 day:

def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    #Ref: https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
    n_vars = 1 if type(data) is list else data.shape[1]
    col_names = data.columns
    df = DataFrame(data)
    cols, names = list(), list()
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [(f'{col_names[j]}_t-{i}') for j in range(n_vars)]
    agg = concat(cols, axis=1)
    agg.columns = names
    
    if dropnan:
        agg.dropna(inplace=True)
    return agg

I then fit my RF, I have attempted many different parameters however the model still overfits.

n_lags = 1

lagged_X = series_to_supervised(df, n_in=n_lags, n_out=0, dropnan=True)
y = df[RESPONSE_VAR][n_lags:]

X_train, X_test, y_train, y_test = train_test_split(
    lagged_X, y, test_size=0.2, random_state=42,
    shuffle=False)

rf = RandomForestRegressor(n_estimators=1, max_depth=1, max_features=1)
rf.fit(X_train, y_train)

preds_train = rf.predict(X_train)
preds_test = rf.predict(X_test)
mae_train = mean_absolute_error(y_train, preds_train)
mae_test = mean_absolute_error(y_test, preds_test)

print(f'mae_train: {mae_train}')
print(f'mae_test: {mae_test}')

> mae_train: 683.4959502405592
> mae_test: 2491.3775235802696

My lagged_X.shape is (3179, 220). I would have assumed constraining the model to only 1 tree and 1 feature, that the model would not be able to overfit? I have displayed the rf.feature_importances_ and attempted to drop too informative columns (?) with higher amounts of estimators and less feature constraints however, this also did not work.

Can I get my Train and Test scores to converge?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP