Data Science Asked by Viktor1903 on December 4, 2020
I am trying to build a word embedding keras model wherein I give as input a text that is converted to its corresponding input ids and masks (like input to an Albert model) and it gives me back a 768 dimensional vector as output. What I am planning to do here is use an Albert layer followed by a LSTM layer and a dense layer to give back a vector. As for target variables, there are 768 dimensional vectors. I want to use something like a cosine proximity function in order to determine the weights of the model. My general architecture is like this. However after training the model for some epochs, when I test the model, I get almost the same output vectors for all inputs. Is there something I need to change to make the model work?
max_seq_length = 400
in_id = Input(shape=(max_seq_length,), name="input_ids")
in_mask = Input(shape=(max_seq_length,), name="input_masks")
in_segment = Input(shape=(max_seq_length,), name="segment_ids")
albert_inputs = [in_id, in_mask, in_segment]
albert_output = AlbertLayer(n_fine_tune_layers=3, pooling="first")(albert_inputs)
x = RepeatVector(1)(albert_output)
x = LSTM(units=512, return_sequences=False,
recurrent_dropout=0.3, dropout=0.3)(x)
x = Flatten()(x)
embedding_output = Dense(768)(x)
model = Model(inputs=albert_inputs, outputs=embedding_output)
model.compile(loss=cosine_proximity, optimizer='adam')
For the target variables, I have corresponding vectors for each training instance. For one input vector, its possible there might be multiple target vectors. I am using separate training instances for the input vector in this case.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP