TransWikia.com

Training a model that has both 2D and 1D features using a CNN

Data Science Asked by MouseAndKeyboard on November 1, 2020

I’m looking to pre-train a model for an RL agent but I’m having some trouble figuring some stuff out.

Dataset: Minerl MineRLNavigateDense-v0

The observation space includes :

2D screen input (64,64) (+ 3 channels of color)
1D (scalar) compass angle
1D (scalar) number of dirt blocks
+ this is all over time.

I am also given the reward based on the action the human took.

When training a model using a CNN for time series classification my understanding is that each feature (out of k total features) at a given point in time can be represented as a scalar value (visualized as each box in each row) and then the “time” component can be visualized by each row being a new timestep:

Visual of what I'm trying to explain

My question is:
How do you set up the training data in such a way where this sort of CNN can be used to train on, particularly because each datapoint has a different dimension, do we just flatten the screen input into 64*64*3=12288 new features? Because that feels wrong.

UPDATE:
My first problem, combining the 2D image and the scalar data points was answered on Discord group I am in. In Keras at least, there’s these special layer types called “merge layers” https://keras.io/layers/merge/ A CNN can be applied to the image first, then merged with the scalar data, this could then be passed into a 1D CNN to add the temporal component, however I haven’t actually done any of this yet. 🙂

One Answer

I do not see how each datapoint would have a different dimension (you have simply 2 extra scalars, always). Could you clarify that?

Two solutions come to my head.

The first one would be to incorporate the scalars into the original picture - augment the picture with it. So that you would have, for example, an extra row for each scalar and end up with a picture of size 66*64. How to do it reasonably is another question - I do not have experience with any problem exactly like this, but using simply one pixel for the value seems pretty marginal. Maybe having the whole row of pixels be the same, representing the value... Or maybe add another dimension and filling all the points in that dimension with the representation of the scalar.

Another solution would be to create an extra branch in the model for the scalar features (with their own classifier) and then merge them with another classifier. I personally have never done it before, but this article https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/ seems to explain it quite well. At this site, https://stackoverflow.com/questions/47818968/adding-an-additional-value-to-a-convolutional-neural-network-input/47819063 you can also find another brief example of the code (and the guy is basically proposing both ways that I mentioned above :) )

I hope this gives you at least an idea. You will need to play with it a bit to choose the best alternative.

Answered by Jakub on November 1, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP