Data Science Asked by ChaoS Adm on November 11, 2020
I have a dataset with a message (string) and an associated mood. I am trying to use an ANN to predict one of the 6 moods using the encoded inputs.
This is how my X_train looks like:
array([list([1, 60, 2]),
list([1, 6278, 14, 9137, 334, 9137, 8549, 1380, 7]),
list([5, 107, 1, 2, 156]), ..., list([1, 2, 220, 41]),
list([1, 2, 79, 137, 422, 877, 5, 230, 621, 18]),
list([1, 11, 66, 1, 2, 9137, 175, 1, 6278, 5624, 1520])],
dtype=object)
Since every array has a different length, it’s not being accepted. What can I do about it?
PS: The encoded values were generated using keras.preprocessing.Tokenizer()
One way is to encode your input in a fixed size dimension. That is, you can use an RNN, like an LSTM, with padding and its output should be the input of an ANN.
Answered by ddaedalus on November 11, 2020
I am not sure how well it will perform, but what about padding the shorter messages with special characters (let's say zeros) so that they are as long as the longest message?
However, some kind of embedding would definitely be better (also for the sake of the target - predicting the sentiment) if you have enough data.
Answered by Jakub on November 11, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP