Data Science Asked by Sociopath on March 19, 2021
I am using keras(tensorflow) to convert text into encodings using tensorflow.keras.preprocessing.text.one_hot
I have used it for training dataset as below
from tensorflow.keras.preprocessing.text import one_hot
corpus = ['nice app']
onehot_repr = [one_hot(words, 10000) for words in corpus]
print(onehot_repr)
# [5779, 2969]
It’s ok upto this point.
But when I use the one_hot
for my testing set it generates different encoding.
I have created a Flask API to test, So how can use same encoding for both train and test set
Result from API is :
[[5129, 4965]]
for same text ['nice app']
Keras' one_hot function has many limitations. The biggest issue is that the function does not actually do one hot encoding, it does the hashing trick.
One possible fix is to use keras' hashing_trick function. It allows the hashing function to specified. If you pick a stable hashing function like md5, then the values will be consistent across runs.
Here is an example:
from tensorflow.keras.preprocessing.text import hashing_trick
corpus = ['nice app']
text_hashed = [hashing_trick(text=words, n=10_000, hash_function='md5') for words in corpus]
assert text_hashed == [[9146, 6067]]
Correct answer by Brian Spiering on March 19, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP