TransWikia.com

Why not using linear regression for finetuning the last layer of a neural network?

Data Science Asked on May 2, 2021

In transfer learning, often only the last layer of the network is retrained using gradient descent.
However, the last layer of a common neural network performs only a linear transformation, so why do we use gradient descent and not linear (or logistic) regression to finetune the last layer?

One Answer

The common approach to fine-tuning an existing pre-trained neural network is the following:

  1. Given an existing pre-trained neural network model (e.g. imagenet), remove the last layer (which does classification in the pre-training task) and freeze all weights in the remaining layers of the model (usually with setting the trainable parameter to false).
  2. Add a new final dense layer that is to be trained on the new task.
  3. Train the model on the new task's dataset until convergence.
  4. [optional] After fine-tuning is converged, possibly unfreeze the all the layers and train further until convergence with a lower learning rate.

A reason to use gradient descent over a different ML algorithm as you suggest is to enable further training after initial fine-tuning (step #4 above). However, it's not necessary to do this. The approach you suggest (to use take the output of the pre-trained model as input to another ML model) may provide satisfactory performance and be more computationally efficient.

Tradeoffs between these approaches is also discussed in the Keras Transfer Learning guide in the section on the "Typical Transfer Learning Workflow".

Answered by grov on May 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP