TransWikia.com

Handling a large dataset consisting of npy files

Data Science Asked by Amin Marshal on July 12, 2021

I have a high number of npy files (448 files) each consisting of around 12k frames (150×150 RGB images) which together make the input to my neural network (X). However, since it is impossible to load all of the files into a single array, and the fact that it is necessary to shuffle all of the samples to avoid bias, how do I create the input and feed it to the network? Someone suggested creating a dummy array to represent indices, shuffle that, create chunks based on the array size and the indices and then feed the chunks to the neural network. However, I was wondering if there is another simpler method.
So in a word, I would like to do this step but with a high number of large npy files:

X_train_filenames, X_val_filenames, y_train, y_val = train_test_split(...)

Note1: Some suggested using TFRecords but I could not find out how to convert and use them.

One Answer

All Deep Learning libraries have data loading APIS that can lazily way to load data.

You mention TFRecords so I assume you are using TensorFlow. You can use TensorFlow's data API.

Answered by Brian Spiering on July 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP