Data Science Asked on March 28, 2021
My initial thought was a neural network but I don’t see how a neural network can properly predict interaction between variables (ie. x1 * x2) since each node is just a sum of previous inputs?
Would a decision tree be better suited at capturing the interaction between variables?
My dataset is large, with 400 features and 5,000,000 instances. All data is in percentile and the label is also a percentile. The dataset is quite noisy as well, (customer data, predicting likelihood of becoming a return customer).
Probabilistic Random Forest tends to work better then other algorithms on noisy datasets. But the data you are using also plays a major role on whether a algorithm will work or not. Check this paper Probabilistic Random Forest for more details. Happy Learning
Answered by Shiv on March 28, 2021
Ensemble methods, boosting or bagging, often give predictive accuracies superior to other methods. From my personal experience, I find GBM (ie. Gradient Boosting Regressor over Decision Trees) and LightGBM(faster) often give very accurate predictions.
Check out this diagram on choosing the right estimator.
Answered by Chong Lip Phang on March 28, 2021
I would make the following models:
If something looks promising, go that direction.
Answered by jeffhale on March 28, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP