TransWikia.com

Is LSTM or pretrained BERTForMasked LM usable for predicting changed word in a sentence using a small dataset? (2000 samples)

Data Science Asked by a_linguist on May 1, 2021

I have a small (2000 samples) dataset of newspaper headlines and their humorous conterparts where only one word is changed to sound silly, for example:

Original headline: Police <officer> arrested for abuse of authority

Humorous headline: Police <dog> arrested for abuse of authority

I want to train a model to predict changed sentence by the original. I am planning to implement two models for this task: one for binary tagging of input sequences (whether a word in a sentence needs to be changed) and one for predicting sentences with changed words.

Example of Model 1 input: Police officer arrested for abuse of authority

Example of Model 1 output: <no-change> <change> <no-change> <no-change> <no-change> <no-change>

Example of Model 2 input: Police <…> arrested for abuse of authority

Example of Model 2 output: Police dog arrested for abuse of authority

I am going to use a RNN/LSTM model for sequence tagging. As for the changed word prediction task, I am thinking of either using LSTM (concatenation of two parallel LSTM layers – one to run forwards on left context of word and the other to run backwards on right context) or fine-tuning BERTForMaskedLM from huggingface/transformers.

The question is whether it would be appropraite considering the small number of data, or should I switch to some other models?

One Answer

Both bidirectional LSTM (like ELMo) and BERT seem appropriate for this kind of task. Whether one or the other performs best can only be known by testing.

If you use BERT, ensure to apply the typical measures to avoid overfitting. If you apply LSTMs, you will probably need regularization measures.

Answered by noe on May 1, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP