Lasso
Main references:
- Statistical learning with sparsity - by Hastie, Tibshirani and Wainwright
Problem statement
The objective of Lasso is:
\[\begin{align} \min_{\beta \in \mathbb{R}^p} \|y - X\beta\|_2^2 \\ \text{subject to} \quad \|\beta\|_1 \leq t \nonumber \end{align}\]The objective function $||y - X\beta||_2^2$ is convex, as the Hessian is $2X^TX$, which is positive semidefinite. Alternatively, it can be seen that $||y - X\beta||_2^2$ is convex, by using the fact that the composition of convex functions is convex. And that Norms are convex functions, $f(x) = x^2$ is convex, and linear functions are convex. The same argument can be used to show that the constraint is convex, and hence the Lasso problem is a convex optimisation problem.
Lagrangian form
As described in the duality post, we can reformulate a constrained optimisation problem into an unconstrained Lagrange form. And for a convex problem, we have sufficient and necessary KKT conditions for optimality.
The “Lagrangian form” for the Lasso problem is:
\[\begin{align} \min_{\beta \in \mathbb{R}^p} \|y - X\beta\|_2^2 + \lambda \|\beta\|_1 \\ \end{align}\]There is a one-to-one correspondence between this formulation with the penalty and the constrained formulation.
The problem in this form can be motivated as a convex relaxation of the $\ell_0$-penalised problem. Which is an NP-hard problem.
Proof of equivalence
The statement we want to prove is that for either of the formulations $(1)$ or $(2)$, $\exists \lambda \geq 0$ and $t \geq 0$ respectively, such that the optimal solutions are the same.
Direction $(2) \implies (1)$:
Let $\beta^*$ be the optimal solution to $(2)$.
\[\begin{align*} &\|y - X\beta^*\|_2^2 + \lambda \|\beta^*\|_1 \leq \|y - X\beta\|_2^2 + \lambda \|\beta\|_1 \quad \forall \beta \in \mathbb{R}^p \\ &\text{If } \|\beta^*\|_1 \geq \| \beta \|_1 \\ &\implies \|y - X\beta^*\|_2^2 \leq \|y - X\beta\|_2^2 \\ \end{align*}\]Therefore if $t = ||\beta^* ||_1$, then $\beta^* $ is also the optimal solution to $(1)$.
Direction $(1) \implies (2)$:
Let $\beta^* $ be the optimal solution to $(1)$.