Trying to use CBOW for tweet classification

Question

I'm trying to use the Continuous Bag Of Words method for word embedding on a corpus of 7503 tweets.
In particular, I'm trying to use CBOW on this Kaggle competition, which involves classifying tweets depending on whether they refer to disasters.
I followed the instructions in this article, and then trained a linear classifier on the average vector of the words in a tweet. The classifier performs very poorly, even on the training set (it only classifies less than 60% of tweets correctly).
There are two reasons I think this might be happening:

Training time. So far, I have trained the CBOW network for 150 epochs, which took about five hours with GPU acceleration. I don't know how much training is generally required, so I have no idea if this is enough. It is currently in the process of running for 500 epochs, but I'm asking here as well, in case there is something else that I'm missing.
Data size. The dataset is very small; in applications that I'm familiar with, neural networks need much more data to train well. Is CBOW a viable model for this problem, or do I need to use pre-trained embeddings? To be clear, I know that in practice one would use pre-trained, but I want to do as much as I can from scratch for my own learning.

I can't find any accessible literature online about best practices for either of these. In particular, I have no idea how much data is usually needed for training (since the neural network is so shallow, I suspect that the usual rules don't apply), or how much training is necessary.
Any help would be very much appreciated!

Trying to use CBOW for tweet classification

Add your own answers!

Ask a Question