Data Science Asked by wheresmycookie on December 18, 2020
I am trying to build a plain RNN using Tensorflow for letter-level language detection. I realize that my approach isn’t the ideal one, but I’m trying to learn by doing the more basic version first.
I am training the RNN on a sequence of letters, and for now I am using 1 hot encodings – each vector corresponding to a letter.
Here are the various placeholders and variables defined in the graph. I’m putting all of them here for reference (Note that truncated_backprop_length
is just the length of the letter sequence that I train with.):
batchX_placeholder = tf.placeholder(tf.float32,
[batch_size, truncated_backprop_length, encoding_size])
batchY_placeholder = tf.placeholder(tf.int32,
[batch_size, num_classes])
init_state = tf.placeholder(tf.float32, [batch_size * encoding_size, state_size])
W = tf.Variable(np.random.rand(state_size + 1, state_size), dtype=tf.float32)
b = tf.Variable(np.zeros((1, state_size)), dtype=tf.float32)
W2 = tf.Variable(np.random.rand(batch_size, encoding_size * batch_size), dtype=tf.float32)
b2 = tf.Variable(np.zeros((1, state_size)), dtype=tf.float32)
W3 = tf.Variable(np.random.rand(state_size, num_classes), dtype=tf.float32)
b3 = tf.Variable(np.zeros((1, num_classes)), dtype=tf.float32)
I unstack the training samples in batchX_placeholder
so I can iterate through them. The loop that iterates through each training example of shape [batch_size, encoding_size]
looks like this:
for current_input in inputs_series:
current_input = tf.reshape(current_input, [batch_size * encoding_size, 1])
input_and_state_concatenated = tf.concat([current_input, current_state], 1)
next_state = tf.tanh(tf.matmul(input_and_state_concatenated, W) + b)
current_state = next_state
I take the final state from this loop and do the following:
logits = tf.matmul(tf.matmul(W2, final_state) + b2, W3) + b3
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits,
labels=batchY_placeholder)
I’m assuming that I’m running my training loop correctly and will leave that out of the question (for now) for simplicity. With the above setup, the loss does not converge during training.
I have reshaped current_input
so it can be stacked against the state to get input_and_state_concatenated
. So the i^{th} group of elements (there are encoding_size
of them) in current_input
now correspond to the i^{th} batch.
I was not sure how to get the final state, of shape [encoding_size * batch_size, state_size]
down to [batch_size, num_classes]
, so I kindof made something up. See the line starting with logits = ....
.
Now that I think about it, these two pieces are inverses of one another – the encoding itself seemed weird, and it led to weirdness in decoding it as well.
Where does my approach to passing a sequence of 1 hot encoded vectors diverge from what is typically done?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP