Not sure how to manage 1 hot encoded data in RNN

Question

Context
I am trying to build a plain RNN using Tensorflow for letter-level language detection. I realize that my approach isn't the ideal one, but I'm trying to learn by doing the more basic version first.
I am training the RNN on a sequence of letters, and for now I am using 1 hot encodings - each vector corresponding to a letter.
What I've done so far:
Here are the various placeholders and variables defined in the graph. I'm putting all of them here for reference (Note that truncated_backprop_length is just the length of the letter sequence that I train with.):
batchX_placeholder = tf.placeholder(tf.float32, 
    [batch_size, truncated_backprop_length, encoding_size])
batchY_placeholder = tf.placeholder(tf.int32, 
    [batch_size, num_classes])
init_state = tf.placeholder(tf.float32, [batch_size * encoding_size, state_size])

W = tf.Variable(np.random.rand(state_size + 1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1, state_size)), dtype=tf.float32)

W2 = tf.Variable(np.random.rand(batch_size, encoding_size * batch_size), dtype=tf.float32)
b2 = tf.Variable(np.zeros((1, state_size)), dtype=tf.float32)

W3 = tf.Variable(np.random.rand(state_size, num_classes), dtype=tf.float32)
b3 = tf.Variable(np.zeros((1, num_classes)), dtype=tf.float32)

I unstack the training samples in batchX_placeholder so I can iterate through them. The loop that iterates through each training example of shape [batch_size, encoding_size] looks like this:
for current_input in inputs_series:
    current_input = tf.reshape(current_input, [batch_size * encoding_size, 1])

input_and_state_concatenated = tf.concat([current_input, current_state], 1)

next_state = tf.tanh(tf.matmul(input_and_state_concatenated, W) + b)

current_state = next_state

I take the final state from this loop and do the following:
logits = tf.matmul(tf.matmul(W2, final_state) + b2, W3) + b3

loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, 
                                               labels=batchY_placeholder)

Results
I'm assuming that I'm running my training loop correctly and will leave that out of the question (for now) for simplicity. With the above setup, the loss does not converge during training.
Pieces I'm questioning

I have reshaped current_input so it can be stacked against the state to get input_and_state_concatenated. So the i^{th} group of elements (there are encoding_size of them) in current_input now correspond to the i^{th} batch.

I was not sure how to get the final state, of shape [encoding_size * batch_size, state_size] down to [batch_size, num_classes], so I kindof made something up. See the line starting with logits = .....

Now that I think about it, these two pieces are inverses of one another - the encoding itself seemed weird, and it led to weirdness in decoding it as well.
Question
Where does my approach to passing a sequence of 1 hot encoded vectors diverge from what is typically done?

Not sure how to manage 1 hot encoded data in RNN

Context

What I’ve done so far:

Results

Pieces I’m questioning

Question

Add your own answers!

Ask a Question