TensorFlow Time Series Tutorial Enhancement Gone Wrong

Question

I’ve been following this time series tutorial for Tensorflow…

https://www.tensorflow.org/tutorials/structured_data/time_series

And it was going good, and seemed to work ok.  I substituted with my own dataset (about 1.1m records with 4 features) and it also seemed to work well, but memory was getting REALLY tight, so I thought I would try and implement the improvement mentioned at the bottom that said…

In addition, you may also write a generator to yield data (instead of
  the uni/multivariate_data function), which would be more memory
  efficient. You may also check out this time series windowing guide and
  use it in this tutorial.

This kind of made sense, as it seems to produce a sliding time window with single steps so the data becomes HUGE.  After reading through this…

https://www.tensorflow.org/guide/data#time_series_windowing

It looked like I could use tensor flow to do the windowing, which I assumed would get executed dynamically therefore not much memory would be used.  After reading, I came up with the following code to try and replace the multivariate_data function (functions are borrowed from the tutorial, though I changed the dense_1_step to get a single set of label features back instead of a window…

def make_window_dataset(ds, window_size=5, shift=1, stride=1):
  windows = ds.window(window_size, shift=shift, stride=stride)

def sub_to_batch(sub):
    return sub.batch(window_size, drop_remainder=True)

windows = windows.flat_map(sub_to_batch)
  return windows

def dense_1_step(batch):
  # Shift features and labels one step relative to each other.
  return batch[:-1], batch[-1:]

# get training samples (features and labels)
train_ds = make_window_dataset(tf.data.Dataset.from_tensor_slices(dataset[:TRAIN_SPLIT]), window_size=past_history+1, shift = 1, stride=1)
dense_labels_train_ds = train_ds.map(dense_1_step)

# get validation samples (features and labels)
val_ds = make_window_dataset(tf.data.Dataset.from_tensor_slices(dataset[TRAIN_SPLIT:]), window_size=past_history+1, shift = 1, stride=1)
dense_labels_val_ds = val_ds.map(dense_1_step)

# batch and shuffle training data
train_data_single = dense_labels_train_ds.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_data_single = dense_labels_val_ds.batch(BATCH_SIZE).repeat()

But for some reason it's all gone pear shaped for the following reasons...

The training loss and validation loss are now NOTHING alike and in fact I don’t think the net is even learning properly anymore
The memory usage seems much better UNTIL you execute the fit() and then it just blows up about the same anyway.
I get the warning ‘Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.’

I’m sure I screwed something up here, but to me it looks like it's doing what it should be doing.  I'm still on a steep learning curve with this, but if anyone with more experience could point me in the right direction I would be very grateful.

Thanks

Ray

raeldor · Answer

It seems I was returning multiple features as labels.  I had to modify the dense_1_step function to return a single feature...

def dense_1_step(batch):
  # Shift features and labels one step relative to each other.
  return batch[:-1], batch[-1:,1][0] # take second feature only

To make it the same as the output from the multivariate_data function.

TensorFlow Time Series Tutorial Enhancement Gone Wrong

One Answer

Add your own answers!

Ask a Question