TransWikia.com

Project structure - many projects share same large dataset

Data Science Asked by jesseWUT on August 16, 2021

I have a bunch of projects for my job that are largely unrelated except they use the same data, which is pretty big on disk in csv format. I want these to exist separately from each other and I usually try to use the cookie cutter data science model for project structure, and keep all my data in a data folder in the root of the project.

But because this dataset is big, I don’t want to have ten copies of it in the root of these ten projects. I also don’t want to put them in one big project sharing it because I feel like they don’t belong together.

What’s the best way to structure multiple different projects that all share the same large dataset?

One Answer

A database is the best option to share data across projects.

Another option is version control. Check the csv into version control. It could be git, GitHub, or data specific version control system.

Answered by Brian Spiering on August 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP