Choosing the size of Character Embedding for Language Generation models

Question

I am working on a character-based Language Generator, loosely based on this tutorial on the TensorFlow 2.0 website. Following the example, I am using an Embedding() layer to generate character embeddings: I want my model to generate text character by character.

My vocabulary counts 86 unique characters. What embedding size should I choose?

Should I always choose an embedding size that is shorter than the size of vocabulary? The embedding size in the example above is much larger than the vocabulary size, I can't understand how this can build an effective model (but apparently it does, if it's an official tutorial, and if anyone can explain me why it'd be much appreciated).

EDIT:

Another thing I find puzzling is: when we generate word embeddings is because we want a dense representation of a word meaning. Does it make sense to make it larger than the actual one-hot encoded vectors we started with?

Noah Weber · Answer

There is a theoretical lower bound for embedding dimension

I would urge you to read this paper, but the gist of it is dimension could be chosen based on corpus statistics

GLOVE paper discussed embedding, check page 7 for graphs. What I want to say with this reference is that you can treat it as hyperparameter and find your optimal value.

EDIT: Here is my personal/borrowed from google rule of thumb. Embedding vector dimension should be the 4th root of the number of categories is start with that, and then I play around it. Read this toward the end when they explain their embedding. Why COULD (it must not) make sence: What is BOW rather than one hot encoding of your n-grams.

Does it make sence to make it larger? it depends. On one hand you are right if we make it too big we loose the distributed representation property of the word embedding matrix, on the other hand it works in praxis.

Choosing the size of Character Embedding for Language Generation models

One Answer

Add your own answers!

Ask a Question