How does Google's Universal Sentence Encoder deal with out-of-vocabulary terms?

Data Science Asked by Tirtha on June 30, 2021

It seems to output embeddings even for random jibberish, and the similarity is even high for this particular pair of jibberish.

np.inner(embed('sdasdasSda'), embed('sadasvdsaf'))

array([[0.70911765]], dtype=float32)

I’m wondering how sentences are tokenized and what preprocessing steps are done internally. Also, how is the embedding model trained? As I understand it, they use a Deep Averaging Network, which is another neural network applied on the average of the individual word embeddings?

nlp word embeddings

Add your own answers!

Ask a Question

Get help from others!

Recent Questions

How can I transform graph image into a tikzpicture LaTeX code?
How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
Need help finding a book. Female OP protagonist, magic
Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

Recent Answers

haakon.io on Why fry rice before boiling?
Peter Machado on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Jon Church on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?