XGBoost hardware requirements. CPU vs. memory

Data Science Asked by HannesZ on April 10, 2021

I am working on predictive models with ML using very roughly 10-50 million records (currently testing with less) and around 10 explanatory variables per model.

When outlining hardware requirements for a good VM-setup, it is often difficult for me to say, which additional computational power improves how well. When I look at the task manager (working on a Windows 64bit machine) during training time of my XGBoost models, the CPU is always at 100% using all the cores.
Looks like parallelization works fine, but the training still takes a lot of time.

It seems logical for me to thus ask primarily for more CPU (that is many more cores). However memory does also play a role, because, if I understand it correctly, the algorithm stores/compresses data differently, if less memory is available.

Here is my question: should I appreciate the offer for a considerable amount of additional memory (that seems easier to get) and be okay with just a modest improvement on the CPU side, or should I ignore all that just ask for more CPU?

hardware python xgboost

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

haakon.io on Why fry rice before boiling?
Jon Church on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Joshua Engel on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?