Why non-differentiable regularization lead to setting coefficients to 0?

Question

The L2 regularization lead to minimize the values in the vector parameter.
The L1 regularization lead to setting some coefficients to 0 in the vector parameter.

More generally, I've seen that non-differentiable regularization function lead to setting coefficients to 0 in the parameter vector. Why is that the case?

Peter · Answer

Look at the penalty terms in linear Ridge and Lasso regression:

Ridge (L2):

Lasso (L1):

Note the absolute value (L1 norm) in the Lasso penalty compared to the squared value (L2 norm) in the Ridge penalty.

In Introduction to Statistical Learning (Ch. 6.2.2) it reads: 
"As with ridge regression, the lasso shrinks the coefficient estimates towards zero. However, in the case of the lasso, the L1 penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter λ is sufficiently large. Hence, much like best subset selection, the lasso performs variable selection."

http://www-bcf.usc.edu/~gareth/ISL/

Why non-differentiable regularization lead to setting coefficients to 0?

One Answer

Add your own answers!

Ask a Question