What techniques are there to train custom sentence classification models with reasonable memory footprint?

Question

We are currently working on tasks that involve user-inputted data (e.g., question-answers, short-answer-grading), with a framework that will allow them to be improved through active learning. However, we are running into an issue for determining best-practices for training these custom models.
In most frameworks, the common suggestion is:

Download an enormous pretrained model (BERT, etc.)
Put your data into a task which will retrain the model for a few epochs
Use your new, specialized, and enormous model

However, this quickly breaks down when you have many such models. You can't let every single user have their own copy of BERT.
As an alternative, there are options for training layers on top of the pre-trained model. So the ideal process is instead:

Download your enormous pretrained model.
Train a task which builds a custom layer/layers on top of the enormous pretrained model.
Store that top-layer model for use on that task

Are there any recipes or projects that represent a good approach to do this which rely on active efforts? We have done some basic ad-hoc layer approaches to training classifiers for this purpose (e.g., you can put a Logistic model or another layer of neural network on top of embeddings), but my hope is that there are approaches out there that are more systematic for training NLP classifiers that add lightweight layers to large-corpus pretrained models.

What techniques are there to train custom sentence classification models with reasonable memory footprint?

Add your own answers!

Ask a Question