Is LSTM or pretrained BERTForMasked LM usable for predicting changed word in a sentence using a small dataset? (2000 samples)

Question

I have a small (2000 samples) dataset of newspaper headlines and their humorous conterparts where only one word is changed to sound silly, for example: Original headline: Police arrested for abuse of authority Humorous headline: Police arrested for abuse of authority I want to train a model to predict changed sentence by the original. I am planning to implement two models for this task: one for binary tagging of input sequences (whether a word in a sentence needs to be changed) and one for predicting sentences with changed words. Example of Model 1 input: Police officer arrested for abuse of authority Example of Model 1 output:

Example of Model 2 input: Police <...> arrested for abuse of authority Example of Model 2 output: Police dog arrested for abuse of authority I am going to use a RNN/LSTM model for sequence tagging. As for the changed word prediction task, I am thinking of either using LSTM (concatenation of two parallel LSTM layers - one to run forwards on left context of word and the other to run backwards on right context) or fine-tuning BERTForMaskedLM from huggingface/transformers. The question is whether it would be appropraite considering the small number of data, or should I switch to some other models?

noe · Answer

Both bidirectional LSTM (like ELMo) and BERT seem appropriate for this kind of task. Whether one or the other performs best can only be known by testing.
If you use BERT, ensure to apply the typical measures to avoid overfitting. If you apply LSTMs, you will probably need regularization measures.

Is LSTM or pretrained BERTForMasked LM usable for predicting changed word in a sentence using a small dataset? (2000 samples)

One Answer

Add your own answers!

Ask a Question