Keras OOM for data validation using GPU

Question

I'm trying to run a deep model using GPU and seems Keras running the validation against the whole validation data set in one batch instead of validating in many batches and that's causing out of memory problem
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[160000,64,64,1] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:GatherV2]
I did not have this problem when I was running on CPU, it's just happening when I'm running on GPU, my fit code looks like this
history =model.fit(patches_imgs_train,patches_masks_train, batch_size=8, epochs=10,
  shuffle=True, verbose=1, validation_split=0.2)

When I delete the validation parameter from the fit method the code works, but I need the validation.

noe · Answer

There may be two causes to your problem:

In validation the network needs more memory.
There is another problem not directly related to this.

In this Keras issue you can find a discussion of a very similar problem. Basically, you can try:

Reducing the batch size.
If you are using Tensorboad, try disabling it or setting its batch_size parameter.

Yassir · Answer

So I could consider what is happening as a bug in Keras implementation, looks like it's trying to load the whole data set to the memory for splitting it into validation and training sets and it's not related to batch size, after trying many ways to go around it I found the best way to approach it is splitting the data using sklearn train_test_split instead of splitting it down in the fitting method using validation_split param.
x_train, x_v, y_train, y_v = train_test_split(x,y,test_size = 0.2,train_size =0.8)

history = model.fit(x_train,y_train,
              batch_size=16,
              epochs=5,
              shuffle=True,
              verbose=2,
              validation_data=(x_v, y_v))

Keras OOM for data validation using GPU

2 Answers

Add your own answers!

Ask a Question