Home Duality
Post
Cancel

Duality

Duality

In this document I give the main results for duality in standard optimisation problems.

Main references:

Standard form

\[\begin{align*} \text{minimize} \quad & f_0(x) \\ \text{subject to} \quad & f_i(x) \leq 0, \quad i = 1, 2, \ldots, m \\ & h_j(x) = 0, \quad j = 1, 2, \ldots, p \\ \end{align*}\]

An optimisation problem is convex if $f_0, f_1, \ldots, f_m$ are convex, and $h_1, h_2, \ldots, h_p$ are affine functions. I will not assume the problem is convex, unless stated otherwise.

For rest of this post, I will just have inequality constraints, that is $p = 0$.

Let $p^*$ denote the optimal value of the problem.

Lagrangian

The idea of the lagrangian is to incoroporate the constraints into the objective function. For a standard form optimisation problem, we have the following definitions:

Lagrangian:

\[\mathcal{L}(x, \lambda) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x)\]

Lagrangian dual function:

\[g(\lambda) = \inf_x \mathcal{L}(x, \lambda)\]

Since the dual function is the pointwise infimum of a family of affine functions of $(\lambda,\nu)$, it is concave, even if the original problem is not convex.

The lagrangian dual provides a lower bound for the optimal value. \(\forall \lambda \geq 0 \quad g(\lambda) \leq p^*\)

If the lagrangian has no lower bound, then $g(\lambda) = -\infty$. Values of $\lambda$ for which $\lambda \geq 0$ and $g(\lambda) \in \mathbb{R}$ are called dual feasible.

Lagrange dual problem

Now that the langrangian dual function, provides a lower bound for the optimal value, when $\lambda \geq 0$. We can try to find the tightest lower bound, by maximising the dual function. This gives us the lagrangian dual problem:

\[\begin{align*} \text{maximize} \quad & g(\lambda) \\ \text{subject to} \quad & \lambda \geq 0 \end{align*}\]

Let $d^*$ denote the optimal value of the dual problem. An immediate inequality by construction is the weak duality property:

\[d^* \leq p^*\]

It now makes sense why points where $\lambda \geq 0$ and $g(\lambda) > -\infty$ are called dual feasible. If there were no feasible points for the dual problem, then $d^* = -\infty$. Using standard convention that the maximum of the empty set is $-\infty$.

Strong duality holds if $d^* = p^*$.

Slater’s condition

In general strong duality does not hold, but if the problem is convex and Slater’s condition holds, then strong duality holds.

Slater’s condition is …

KKT conditions

If we assume strong duality holds and we have primal and dual optimal points $x^* $ and $\lambda^* $, respectively (no convex assumption). Then we can write the following:

\[\begin{align} f_0(x^*) &= g(\lambda^*) \nonumber\\ &= \inf_x \mathcal{L}(x, \lambda^*) \nonumber\\ &= \inf_x \left( f_0(x) + \sum_{i=1}^m \lambda_i^* f_i(x) \right) \\ &\leq f_0(x^*) + \sum_{i=1}^m \lambda_i^* f_i(x^*) \nonumber\\ &\leq f_0(x^*) \nonumber \end{align}\]

The first equality uses the strong duality assumption.

Therefore, as the inequalities are sandwiched, all the inequalities are equalities. This implies:

\[\begin{align} f_0(x^*) = g(\lambda^*) &= \mathcal{L}(x^*, \lambda^*) \\ \sum_{i=1}^m \lambda_i^* f_i(x^*) &= 0 \implies \lambda_i^* f_i(x^*) = 0, \quad i = 1, 2, \ldots, m \end{align}\]

Equation $(2)$, means that $x^* $ minimises $\mathcal{L}(x, \lambda^*)$ over $x$.
Equation $(3)$ is refered to as complementary slackness.

KKT necessary conditions

If we have the additional assumption that all of the functions in the problem are differentiable. We have the following KKT conditions:

\[\begin{align*} \nabla \mathcal{L}(x^*, \lambda^*) &= 0 \\ \lambda_i^* f_i(x^*) &= 0, \quad i = 1, 2, \ldots, m \\ \lambda_i^* &\geq 0, \quad i = 1, 2, \ldots, m \end{align*}\]

These statements, hold from the inequality sandwiching argument above $(1)$. For example, as $x^* $ minimises $\mathcal{L}(x, \lambda^* )$ we have the first order necessary condition.

For any optimisation problem with differentiable functions, any pair of primal and dual optimal points $x^* $ and $\lambda^* $ satisfy the KKT conditions above.

KKT sufficient conditions

If it is assumed that the primal problem is convex, then the KKT conditions are sufficient for optimality. That is, if the conditions are satisfied by some $x^* $ and $\lambda^* $, then $x^* $ is primal optimal and $\lambda^* $ is dual optimal, where the duality gap is zero.

This is because, with the assumption that the primal problem is convex, the lagrangian will also be convex. Therefore, if the stationary condition is satisfied at $x^* $, the lagrangian will be minimised at $x^* $, and we can write $(4)$ below. Then $(5)$ follows by complementary slackness.

\[\begin{align} g(\lambda^*) &= \mathcal{L}(x^*, \lambda^*) \\ &= f_0(x^*) \end{align}\]

Hence, we have zero duality gap, and therefore $x^* $ and $\lambda^* $ are primal and dual optimal. Recalling that the Lagrangian dual function provides a lower bound for the optimal value.

Summary

In summary if the primal problem is convex, with differentiable functions, and Slater’s condition holds (this implies strong duality, there are other constraint qualifications that imply strong duality). Then the KKT conditions are necessary and sufficient for primal dual points to be optimal.


Non-differentiable problems

Now suppose, that at least one of the functions in the problem is non-differentiable, for at least one point in its domain. And we assume that the problem is convex.

Such an example, would be a problem that has the constraint

\[\begin{align*} f_i(x) = |x| \leq t \end{align*}\]

Then $f_i(x) $ is non-differentiable at $x = 0 $.

The stationary condition

\[\begin{align*} \nabla \mathcal{L}(x^*, \lambda^*) &= 0 \end{align*}\]

now no longer makes sense, as the gradient of the Lagrangian may not exist.

Instead of gradients, we can use subgradients. In particular, the stationary condition becomes:

\[\begin{align*} 0 \in \partial \mathcal{L}(x^*, \lambda^*) \end{align*}\]

And as the subgradient of a sum of convex functions is convex, this can be wrriten as:

\[\begin{align*} 0 \in \partial f_0(x^*) + \sum_{i=1}^m \lambda_i^* \partial f_i(x^*) \end{align*}\]

Necessity and sufficiency of the KKT conditions using the same string of inequalities, and the statement that:

A point $x^* $ is a minimizer of a function $f$ (not necessarily convex) if and only if $f$ is subdifferentiable at $x^* $ and $0 \in \partial f(x^* )$. (Section 2.3 of Notes on subgradients)

This post is licensed under CC BY 4.0 by the author.