TransWikia.com

How does Google's Universal Sentence Encoder deal with out-of-vocabulary terms?

Data Science Asked by Tirtha on June 30, 2021

It seems to output embeddings even for random jibberish, and the similarity is even high for this particular pair of jibberish.

np.inner(embed('sdasdasSda'), embed('sadasvdsaf'))
array([[0.70911765]], dtype=float32)

I’m wondering how sentences are tokenized and what preprocessing steps are done internally. Also, how is the embedding model trained? As I understand it, they use a Deep Averaging Network, which is another neural network applied on the average of the individual word embeddings?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP