TransWikia.com

What is the best way to store images in python for machine learning

Data Science Asked on April 20, 2021

I am currently working on a classification problem that requires me to classify whether an image contains cancerous tissue cells or not. Each image is 50x50x3 pixels, the 3 is for RGB values.

So far I have a pandas dataframe that contains the target value, patient id, image id and the path to the corresponding image.

I can access the image by using

io.imread(df['path'])

So it is possible for me to loop through all the images to access them. The question now is, where do I store the images so that I can apply principle component analysis on them?

If I were to simply store it in a dataframe it would contain 7500 columns; 1 for each pixel value. My dataset contains 280,000 images. That means my my dataframe would need to be 280,000×7500. I Feel that there is a better way to approach this problem.

Your input to this matter would be highly appreciated.

2 Answers

This might be a bit more complicated.

I normally reuse computer vision and deep learning software to do that. Even if I don't do Deep Learning.

Particularly I use Pytorch, for its bridge with Numpy and pandas. Here is a tutorial.

This allows me to use a GPU if wanted, and to reuse a lot of code since for deep learning and images there is tons of code snippets out there.

Correct answer by Carlos Mougan on April 20, 2021

Yes pandas won't work well for this. You can look at sparse data formats https://docs.scipy.org/doc/scipy/reference/sparse.html

Or maybe check how it's done in Tensorflow.

Answered by Dirk Nachbar on April 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP