Stack Overflow Asked by diveshb on December 13, 2021
Currently im writing a code where i extract images RGB Values using Opencv/PIL (tried both).
Then i put them through a function to calculate their mean and median and also the mean and median of the upper and lower parts.
Currently my code takes about 1 image per second and i need to do this for 10,000+ images that are stored in different subfolders of their categories.
I use numpy functions for mean and median.
Is there a faster way i can do this?
Edit : The images are all different sizes and have dimensions varying from say 1×1 to 1000×100 and are of the formats jpg, png and bmp
As for the code, i know accessing the image shouldn’t take long but the problem lies in computing the mean and median of those arrays.
Ill add a code snippet below of what it looks like (I apologize in advance if it looks bad)
I also write all of these mean and median values to an excel sheet in the end using xlwt which I hope shouldnt take long either.
I use os.walk to traverse the directory after which i use
img = os.path.join(dirName,fname)
and get values from a function defined in another file
values = rgbavg(img)
image = cv2.imread(image_path)
img = np.array(image)
img = img.transpose(2,0,1).reshape(3,-1)
x, size = img.shape
avg = np.mean(img, axis = 1)
for i in range(0,3):
upper = np.array([])
lower = np.array([])
for ele in img[i]:
if ele > avg[i]:
upper = np.append(upper,ele)
else:
lower = np.append(lower,ele)
if upper.size != 0:
mean = np.mean(upper)
avg = np.append(avg,mean)
else:
avg = np.append(avg,0)
if lower.size != 0:
mean = np.mean(lower)
avg = np.append(avg,mean)
else:
avg = np.append(avg,0)
What is possible is that your program is spending a lot of time just waiting for read write operations and a lot of loops.
actual answer under updated
The waiting part you can mitigate by making use of multiple processes. Easiest way for you I think would be using a Pool. This also can increase your code speed by however many cores you have available.
First you would prepare your data (gather a list of all the files/file paths)
Then you would pass that as an argument so Pool creates processes and saves results
import multiprocessing
import time
files = ["img1.jpeg", "img2.jpeg", "img3.jpeg", "img4.jpeg"]
def process_image(path):
# process image and return your data
# time.sleep is only here to show that a different process is running as it executes faster than pool starts processes
time.sleep(1)
return [[0,0,0], [14,14,14], multiprocessing.current_process().name]
if __name__ == '__main__':
with multiprocessing.Pool(processes=4) as pool:
results = pool.map(process_image, files)
print(results)
Updated:
After inspecting the original code I found the filtering is what takes a long time (in this case separating above and below average values in the array). Numpy has a faster filter:
boolean_array = img_array <|>|= value # returns a boolean array
and
filtered = img_array[boolean_array] # returns the filtered list
import multiprocessing
import numpy
import cv2
def process_image(xyz):
img = numpy.random.randint(255, size=(1000,1000,3),dtype=numpy.uint8)
.transpose(2,0,1).reshape(3,-1)
avg = numpy.mean(img, axis = 1)
x, size = img.shape
for i in range(0,3):
upper = img[i][img[i] >= avg[i]]
lower = img[i][img[i] < avg[i]]
if upper.size != 0:
# Why you are saving these idk but ok
mean = numpy.mean(upper)
avg = numpy.append(avg,mean)
else:
avg = numpy.append(avg,0)
if lower.size != 0:
mean = numpy.mean(lower)
avg = numpy.append(avg,mean)
else:
avg = numpy.append(avg,0)
return [avg, mean, multiprocessing.current_process().name]
if __name__ == '__main__':
with multiprocessing.Pool(processes=4) as pool:
results = pool.map(process_image, range(0,12))
print(results)
This is with multiprocessing added
Answered by IamFr0ssT on December 13, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP