Project structure - many projects share same large dataset

Question

I have a bunch of projects for my job that are largely unrelated except they use the same data, which is pretty big on disk in csv format.  I want these to exist separately from each other and I usually try to use the cookie cutter data science model for project structure, and keep all my data in a data folder in the root of the project.

But because this dataset is big, I don't want to have ten copies of it in the root of these ten projects.  I also don't want to put them in one big project sharing it because I feel like they don't belong together.

What's the best way to structure multiple different projects that all share the same large dataset?

Brian Spiering · Answer

A database is the best option to share data across projects.

Another option is version control. Check the csv into version control. It could be git, GitHub, or data specific version control system.

Project structure - many projects share same large dataset

One Answer

Add your own answers!

Ask a Question