Data Science Asked on July 12, 2021
I’ve learned machine learning via textbooks and examples, which don’t delve into the engineering challenges of working with “big-ish” data like Kaggle’s.
As a specific example, I’m working on the New York taxi trip challenge. It’s a regression task for ~ 2 million rows and 20 columns.
My 4GB-RAM laptop can barely do EDA with pandas
and matplotlib
in Jupyter Notebook. However, when I try to build a random forest with 1000 trees, it hangs (e.g. Kernel restart error in Jupyter Notebook).
To combat this, I set up a 16GB-RAM desktop. I then ssh in, start a browser-less Jupyter Notebook kernel, and connect my local Notebook to that kernel. However, I still max out that machine.
At this point, I’m guessing that I need to run my model training code as a script.
My current toolkit is RStudio, Jupyter Notebook, Emacs, but am willing to pick up new things.
Answered by Brian Spiering on July 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP