TransWikia.com

Keras OOM for data validation using GPU

Data Science Asked by Yassir on October 3, 2021

I’m trying to run a deep model using GPU and seems Keras running the validation against the whole validation data set in one batch instead of validating in many batches and that’s causing out of memory problem

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[160000,64,64,1] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:GatherV2]

I did not have this problem when I was running on CPU, it’s just happening when I’m running on GPU, my fit code looks like this

history =model.fit(patches_imgs_train,patches_masks_train, batch_size=8, epochs=10,
  shuffle=True, verbose=1, validation_split=0.2) 

When I delete the validation parameter from the fit method the code works, but I need the validation.

2 Answers

There may be two causes to your problem:

  • In validation the network needs more memory.
  • There is another problem not directly related to this.

In this Keras issue you can find a discussion of a very similar problem. Basically, you can try:

  • Reducing the batch size.
  • If you are using Tensorboad, try disabling it or setting its batch_size parameter.

Answered by noe on October 3, 2021

So I could consider what is happening as a bug in Keras implementation, looks like it's trying to load the whole data set to the memory for splitting it into validation and training sets and it's not related to batch size, after trying many ways to go around it I found the best way to approach it is splitting the data using sklearn train_test_split instead of splitting it down in the fitting method using validation_split param.

x_train, x_v, y_train, y_v = train_test_split(x,y,test_size = 0.2,train_size =0.8)

history = model.fit(x_train,y_train,
              batch_size=16,
              epochs=5,
              shuffle=True,
              verbose=2,
              validation_data=(x_v, y_v))

Answered by Yassir on October 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP