Data Science Asked by kash on March 16, 2021
I am fitting a multi class model using Xgboost. I am getting an accuracy of 96% on Train and 95% on test. I am using the 80-20 train/test split.
However, when I am adding two new features , the accuracy drops down to 92% for train and 89% for test.
Doesnt XGBoost:
I have not used cross validation. Could it be that I am still overfitting the data ?
This is the code I used
from sklearn.model_selection import train_test_split
df_new_train, df_new_test, y_train, y_test = train_test_split(df, labels2, test_size = 0.2)
dtrain = xgb.DMatrix(df_new_train, label=y_train)
dtest = xgb.DMatrix(df_new_test, label=y_test)
param = {
'max_depth': 10,
'early_stopping_rounds': 10,
'eta': 0.01,
'subsample': 0.6,
'colsample_bytree': 0.5,
#'alpha': 0.5,x`
#'lambda': 0.5,
'gamma': 10,
'min_child_weight': 1,
'watchlist': [(dtrain, 'train'), (dtest, 'valid')],
'objective': 'multi:softprob', # error evaluation for multiclass training
'num_class': 4} # the number of classes that exist in this datset
num_round = 1500
bst = xgb.train(param, dtrain, num_round)
preds = bst.predict(dtest)
preds_train = bst.predict(dtrain)
best_preds_train = np.asarray([np.argmax(line) for line in preds_train])
best_preds = np.asarray([np.argmax(line) for line in preds])
print(classification_report(y_test,best_preds,target_names=label_encoder.classes_ ))
```
This is something you can try, remove the parameter early stopping rounds and the number as I see is relatively low. It could be that xgboost stopped training before it is supposed because it may not have seen any improvement. Also you might consider setting random seed to get consistent result between attempts.
Answered by Yohanes Alfredo on March 16, 2021
In general, when you change the data being fed into a model you should also consider re-tuning the model parameters.
It could be that the addition of the two new features in your data set means that your existing model parameters (e.g. eta
, gamma
, etc.) are no longer optimal.
Answered by bradS on March 16, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP