Lasso

Main references:

Statistical learning with sparsity - by Hastie, Tibshirani and Wainwright

Problem statement

The objective of Lasso is:

\[\begin{align} \min_{\beta \in \mathbb{R}^p} \|y - X\beta\|_2^2 \\ \text{subject to} \quad \|\beta\|_1 \leq t \nonumber \end{align}\]

The objective function $||y - X\beta||_2^2$ is convex, as the Hessian is $2X^TX$, which is positive semidefinite. Alternatively, it can be seen that $||y - X\beta||_2^2$ is convex, by using the fact that the composition of convex functions is convex. And that Norms are convex functions, $f(x) = x^2$ is convex, and linear functions are convex. The same argument can be used to show that the constraint is convex, and hence the Lasso problem is a convex optimisation problem.

Lagrangian form

As described in the duality post, we can reformulate a constrained optimisation problem into an unconstrained Lagrange form. And for a convex problem, we have sufficient and necessary KKT conditions for optimality.

The “Lagrangian form” for the Lasso problem is:

\[\begin{align} \min_{\beta \in \mathbb{R}^p} \|y - X\beta\|_2^2 + \lambda \|\beta\|_1 \\ \end{align}\]

There is a one-to-one correspondence between this formulation with the penalty and the constrained formulation.

The problem in this form can be motivated as a convex relaxation of the $\ell_0$-penalised problem. Which is an NP-hard problem.

Proof of equivalence

The statement we want to prove is that for either of the formulations $(1)$ or $(2)$, $\exists \lambda \geq 0$ and $t \geq 0$ respectively, such that the optimal solutions are the same.

Direction $(2) \implies (1)$:

Let $\beta^*$ be the optimal solution to $(2)$.

\[\begin{align*} &\|y - X\beta^*\|_2^2 + \lambda \|\beta^*\|_1 \leq \|y - X\beta\|_2^2 + \lambda \|\beta\|_1 \quad \forall \beta \in \mathbb{R}^p \\ &\text{If } \|\beta^*\|_1 \geq \| \beta \|_1 \\ &\implies \|y - X\beta^*\|_2^2 \leq \|y - X\beta\|_2^2 \\ \end{align*}\]

Therefore if $t = ||\beta^* ||_1$, then $\beta^* $ is also the optimal solution to $(1)$.

Direction $(1) \implies (2)$:

Let $\beta^* $ be the optimal solution to $(1)$.

Lasso

Lasso

Problem statement

Lagrangian form

Further Reading

Useful custom R functions

Linux Binaries

Matrix Norms