TransWikia.com

XGBoost Regressor model reproducibility

Data Science Asked by dev30 on May 22, 2021

I am running an XGBoost Regressor to predict electricity consumption (load) and further classify predicted values as peaks or not. As for dataset I started with hourly energy load data + hourly weather information.
After playing with different feature set, mean, std, slopes and rolling windows I ended up with certain model (saved as pickle file) that delivers reasonable results.

Yet after some feature turning, error went up, and prediction quality worsened. So we switched back to previous best yielding feature set, yet failed to reproduce the same model.

Old binary model, once loaded, still does good job predicting peak hours, yet exactly same feature set fails to train model to same state.

I know that boosting algorithm is stochastic in nature, yet is there a way to know the path the boosting tree within a binary model, to somehow hint model during train to repeat same choices it made before? (random state is set, numpy random seed also, hyper params were picked with grid search)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=0.7, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.1, max_delta_step=0, max_depth=5,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=1000, n_jobs=8, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=0.7,
             tree_method='exact', validate_parameters=1, verbosity=None)

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP