Data Science Asked by helloimgeorgia on May 20, 2021
Since gamma limits splits unless they meet a minimum gain threshold, isn’t that the same thing as removing features that have low average gain? Both will results in splits with higher average gains.
I currently have a dataset with 22m samples, and 200 features. My problem is overfitting, where the model performs well in-sample but poorly out-of-sample. I have improved the model (ie. lowered the overfitting, but not completely remove it) by setting gamma = 1. I am wondering if additional out of sample performance improvements would be possible by lowering the number of features?
My thinking is that removing highly correlated features (R >0.9) will help make the importance of each feature more correct by avoiding dilution of correlated features… but not directly help improve the performance of the model. However, maybe removing low gain features somehow directly improves the model since the algorithm could "use" correlated features to overfit?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP