Dave's Journal: September 2015

Monday, September 28, 2015

Weak and strong alternatives

Weak alternatives

Two systems of inequalities are called weak alternatives if at most one of the two is feasible.
This is true whether or not the inequalities of system 1 are convex (i.e. \(f_i\) convex, \(h_i\) affine)
System 2 is always convex (\(g\) concave and \(\lambda_i\ge 0\) are convex)

Lagrange duality theory can be applied to the problem of determining feasibility of a system of inequalities and equalities
\begin{equation}f_i(x) \leq 0, \; i=1,\cdots,m,\quad h_i(x)=0,\; i=1,\cdots,p
\end{equation}
The above can be posed as the standard problem with objective \(f_0=0\):
\begin{align*}
\text{minimize} \quad &0\\
\text{subject to}\quad &f_i(x) \leq 0, \quad i=1,\cdots,m \\
&h_i(x) = 0, \quad i=1,\cdots,p
\end{align*}
This problem has optimal value
\[ p^* = \begin{cases}
0\quad &\text{if system is feasible}\\
\infty\quad &\text{if system is infeasible}
\end{cases}
\]
We associate with the inequality system the dual function
\[ g(\lambda,\nu)= \inf_{x\in\mathcal{D}} \left(\sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \right) \]
Note that the dual function has is positive homogeneous, i.e. that for \(\alpha > 0, \; g(\alpha\lambda,\alpha\nu) = \alpha g(\lambda,\nu)\). The dual problem of maximizing \(g(\lambda,\nu) \; s.t. \; \lambda \succeq 0\) has the optimal value
\[
d^* = \begin{cases}
\infty \quad &\lambda \succeq 0, \; g(\lambda,\nu) \gt 0 \text{ is feasible}\\
0 \quad &\lambda \succeq 0, \; g(\lambda,\nu) \gt 0 \text{ is infeasible}
\end{cases}
\]
From weak duality \(d^*\leq p^*\) and the above facts, we can conclude that the inequality system
\begin{equation} \lambda\succeq 0, \quad g(\lambda,\nu) \gt 0\end{equation} is feasible \(d^*=\infty\), then the inequality system is infeasible (since \(p^* = \infty\))

A solution to the dual function is a certificate of infeasibility of the system. To summarize:

feasibility of system 2 implies infeasibility of system 1
feasibility of system 1 implies infeasibility of system 2

Strong alternatives

When the original inequality system is convex, and some type of constraint qualification holds, then the pairs of weak alternatives becomes strong alternatives, which means exactly one of the two alternatives holds.

If system 1 is convex, it can be expressed as
\[f_i(x) \leq 0, \; i=1,\dotsm,m,\quad Ax=b, \; A\in \mathbf{R}^{p\times n}\]
This system and its alternative
\[ \lambda\succeq 0,\quad g(\lambda,\nu) \gt 0\]
are strong alternatives provided there exists an \(x\in \text{relint } \mathcal{D}\) with \(Ax=b\) and the optimal value \(p^*\) is attained. [More on that later...]

See Farkas' lemma

Friday, September 25, 2015

Karush-Kuhn-Tucker (KKT) conditions

If strong duality (with zero duality gap) holds and \(x,\lambda,\nu\) - possibly more than one pair - are optimal (see previous post), and the problem has differentiable \(f_i, h_i\), it must satisfy the KKT conditions: [necessary condition]

primal constraints: \(f_i(x)\leq 0, i=1,\cdots,m, \; h_i(x) = 0, i=1,\cdots,p\)
dual constraints: \(\lambda\succeq 0\)
complementary slackness: \(\lambda_i f_i(x) = 0, i=1,\cdots,m \)
gradient of Lagrangian with respect to \(x\) vanishes:
\[\nabla f_0(x) + \sum_{i=1}^m \lambda_i \nabla f_i(x) + \sum_{i=1}^p \nu_i\nabla h_i(x) = 0\]

For convex problems

Key: When the primal problem is convex, the KKT conditions are also sufficient for points to be primal and dual optimal.

If \(\tilde{x},\tilde{\lambda},\tilde{\nu}\) satisfy KKT for a convex problem (i.e. \(f_i\) convex, \(h_i\) affine). Then they are optimal, with zero duality gap:

from complementary slackness: \(f_0(\tilde{x})=L(\tilde{x},\tilde{\lambda},\tilde{\nu})\)
from 4th condition (and convexity): \(g(\tilde{x},\tilde{\lambda},\tilde{\nu}) =L(\tilde{x},\tilde{\lambda},\tilde{\nu})\)

hence, \(f_0(\tilde{x}) = g(\tilde{x},\tilde{\lambda},\tilde{\nu})\)

If Slater's condition is satisfied:

\(x\) is optimal iff there exist \(\lambda,\nu\) that satisfy KKT conditions

Slater implies strong duality, therefore dual optimum is attained
generalizes optimality condition \(\nabla f_0(x) = 0\) for unconstrained problem

Key: Many algorithms for convex optimization are conceived as methods for solving the KKT conditions.

Slater's condition

If the primal problem is convex i.e. of the form
\begin{align*}
\text{minimize}\quad &f_0(x) \\
\text{subject to}\quad &f_i(x) \leq 0, \quad i = 1,\cdots,m, \\
&Ax=b
\end{align*} with \(f_0,\cdots,f_m\) convex, and if there exists a point that is strictly feasible, or more precisely an \(x\in \text{relint }\mathcal{D}\) such that

\[ f_i(x) \lt 0, \quad i=1,\cdots,m, \quad Ax=b.\]

Strong duality holds if the above condition is met.
Also, the dual optimum is attained \(d^*\gt -\infty\)

Thursday, September 24, 2015

Notes on duality

Given a standard form problem (not necessarily convex)
\begin{align*}
\text{minimize} \quad &f_0(x) \\
\text{subject to} \quad &f_i(x)\leq 0, &i=1,\cdots,m \\
&h_i(x) = 0, &i=1,\cdots,p
\end{align*}
variable \(x\in \mathbb{R}^n\), domain \(\mathcal{D}\), optimal value \(p^*\)
with Lagrangian:
\[ L(x,\lambda,\nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \]
and the (Lagrange) dual function:
\[ g(\lambda,\nu) = \inf_{x\in \mathcal{D}} L(x,\lambda,\nu)\]
Properties:

\(g\) is concave over \(\lambda, \nu\) since it is the infimum of a family of affine functions.
Lower bound property: if \(\lambda \succeq 0\) then \(g(\lambda,\nu) \leq p^*\)

since, if \(\tilde{x}\) is feasible and \(\lambda \succeq 0\), then
\[ g(\lambda,\nu) = \inf_{x\in \mathcal{D}} L(x,\lambda,\nu) \leq L(\tilde{x}, \lambda, \nu) \leq f_0(\tilde{x})\] minimizing over all feasible \(\tilde{x}\) gives \( g(\lambda,\nu) \leq p^*\)

Lagrange dual and conjugate function

Given an optimization problem with linear inequality and equality constraints
\begin{align*}
\text{minimize}\quad &f_0(x) \\
\text{subject to}\quad &Ax\preceq b \\
&Cx=d.
\end{align*}
Using the definition of \(f^* = \sup_{x\in \text{dom }f} (y^T x - f(x))\), we can write the dual function as:
\begin{align*}
g(\lambda,\nu)\quad &= \quad \underset{x}{\inf} (f_0(x) + \lambda^T (Ax-b) + \nu^T(Cx=d)) \\
&= \quad -b^T \lambda - d^T \nu + \underset{x}{\inf} (f_0(x) + (A^T \lambda + C^T \nu)^T x)\\
&= \quad -b^T \lambda - d^T \nu - f^* (-A^T \lambda - C^T \nu)
\end{align*} with domain
\[ \text{dom }g = \{ (\lambda,\nu) \;|\; -A^T \lambda - C^T \nu \in \text{dom } f_0^* \} \]

simplifies derivation of dual if conjugate of \(f_0\) is known

The dual problem (finding the greatest lower bound for the primal problem)
\begin{align*}
\text{maximize}\quad &g(\lambda,\nu) \\
\text{subject to}\quad &\lambda \succeq 0
\end{align*}

a convex optimization problem; optimal value denoted \(d^*\)
\(\lambda,\nu\) are dual feasible if \(\lambda \succeq 0, (\lambda,\nu)\in \text{dom }g\)

Weak duality: \(d^* \leq p^*\)

always holds (for convex and nonconvex problems)
can be used to find lower bounds for difficult problems

Strong duality: \(d^* = p^*\) (zero duality gap)

does not hold in general
(usually) holds for convex problems
constraint qualifications assert conditions that guarantee strong duality in convex problems (e.g. Slater's constraint qualification)

Notes on dual norm

Dual norm is defined as
\[ \|z\|_* = \sup \{z^Tx \;|\; \|x\| \leq 1\}\] The dual norm can be interpreted as the operator norm of \(z^T\), interpreted as a \(1\times n\) matrix with the norm \(\|\cdot\|\) on \(\mathbf{R}^n\).

From the definition of dual norm we obtain the inequality
\[ z^Tx \leq \|x\| \|z\|_*\]
The conjugate of any norm \(f(x)=\|x\|\) is the indicator function of the dual norm unit ball
\[f^*(y) = \left\{ \begin{array}
00 & \|y\|_* \leq 1 \\
\infty & \text{otherwise}
\end{array}
\right. \]
Proof. If \(\|y\|_* \gt 1\), then by definition of the dual norm, there is a \(z\in \mathbf{R}^n\) with \(\|z\| \leq 1\) and \(y^T z \gt 1\). Taking \(x=tz\) and letting \(t \rightarrow \infty\), we have
\[ y^T x - \|x\| = t(y^Tz - \|z\|) \rightarrow \infty\] which shows that \(f^*(y) = \infty\). Conversely, if \(\|y\|_*\leq 1\), then we have \(y^Tx \leq \|x\| \|y\|^*\) for all \(x\), which implies for all \(x\), \(y^Tx - \|x\| \leq 0\). Therefore \(x=0\) is the value that maximizes \(y^Tx - \|x\|\), with maximum value 0.

Wednesday, September 23, 2015

Schur complement: characterizations of positive definiteness for block matrices

Let a block matrix \(X\in \mathbf{S}^n\) be partitioned as

\[ X = \begin{bmatrix}
A &B \\
B^T &C
\end{bmatrix} \] where \(A\in \mathbf{S}^k\). If \(\det A \neq 0\), the schur complement is
\[ S =C - B^TA^{-1}B \] The following characterizations of positive definiteness can be derived

\(X\succ 0\) iff \(A\succ 0\) and \(S\succ 0\)
If \(A\succ 0\), then \(X\succeq 0\) iff \(S\succeq 0\)

Example:
\begin{align*}
&A^TA \preceq t^2I, \quad &t\ge 0 \\
\Longleftrightarrow \quad &tI-t^{-1}A^T I A \succeq 0, \quad &t\ge 0 \\
\Longleftrightarrow \quad &
\begin{bmatrix}
tI & A \\
A^T & tI
\end{bmatrix} \succeq 0
\end{align*}

Tuesday, September 22, 2015

Notes on optimization problem

Optimality criterion for differentiable \(f_0\)
Given a convex optimization problem with objective \(f_0\). \(x\) is optimal iff it is feasible and
\[ \nabla f_0(x)^T (y-x) \ge 0 \quad \text{for all feasible } y \]

if \(\nabla f_0(x) \neq 0 \), it means that \( -\nabla f_0(x) \) defines a supporting hyperplane to the feasible set at \(x\).

Unconstrained problem: \(x\) is optimal iff
\[ x\in \text{dom }f_0, \quad \nabla f_0(x) = 0 \]
Equality constrained problem:
\[ \text{minimize } f_0(x) \quad \text{ subject to } Ax=b \]
\(x\) is optimal iff there exists a \(\nu\) such that
\[ x\in \text{dom }f_0, \quad Ax=b, \quad \nabla f_0(x)+A^T\nu = 0\]
Note: Lagrange multiplier optimality condition.

Minimization over nonnegative orthant
\[ \text{minimize } f_0(x) \quad \text{ subject to } x\succeq 0 \]
\(x\) is optimal iff
\[ x\in\text{dom }f_0, \quad x\succeq 0, \quad \left\{ \begin{array}{lr}
\nabla f_0(x)_i \ge 0, \quad &x_i = 0 \\
\nabla f_0(x)_i = 0, \quad &x_i \ge 0
\end{array} \right. \]
Note: The last condition is a complementarity condition.

Monday, September 21, 2015

Notes on conjugate function (Fenchel conjugate)

The conjugate function (Fenchel conjugate)

Counterpart to Fourier transform in Convex Analysis

The conjugate of \(f\) (\(f\) not necessarily convex):

\[ f^*(y) = \underset{x\in \text{dom }f}{\sup} (y^Tx - f(x)) \]

The domain of \(f^*\) consists of \(y\in \mathbb{R}^n\) for which the supermum is finite, i.e. for which the difference \(y^T x - f(x)\) is bounded above on \(\text{dom }f\).

\(f^*\) is convex, since it is the pointwise supremum of a family of affine functions of \(y\).

Examples:

Affine function. \(f(x)=ax+b\). \(f^*(y) = yx-ax-b = (y-a)x-b\).

\[ f^*(y) = \left\{ \begin{array}
-b &y=a\\
\infty &\text{otherwise}
\end{array} \right. \] with \(\text{dom }f^* = \{a\}\).

Negative logarithm. \(f(x) = -\log x\), with \(\text{dom }f=\mathbb{R}_{++}\).

The function \(xy + \log x\) is unbound above if \(y\ge 0\) and reaches its max at \(x=-1/y\) otherwise.
\[ f^*(y) = \left\{ \begin{array}
-1-\log (-y) &y\lt 0\\
\infty &\text{otherwise}
\end{array} \right. \] with \(\text{dom }f^* = \{y\; |\; y\lt 0 \}\)

Fenchel's inequality

Fenchel's inequality can be obtained from the definition of conjugate function

\[ f(x) + f^*(y) \ge x^T y , \quad \forall x,y\]
since for each \(y\),
\[ f^*(y) \ge x^Ty - f(x) \Leftrightarrow f(x) + f^*(y) \ge x^Ty \]

Notes on convex functions

Basic Properties

[TODO]

Examples

Affine functions are both convex and concave
All norms are convex (e.g. Frobenius norm, Spectral norm)
Max function \(f(x) = \max_i x_i\) is convex on \(\mathbb{R}^n\)
Quadratic-over-linear function \(f(x,y)=x^2/y\) is convex on \(\text{dom }f=\mathbb{R}\times\mathbb{R}_{++}\)
Log-sum-exp \(f(x)=\log \sum^n_{i=1}e^{x_i}\) is convex on \(\mathbb{R}^n\)
Geometric mean \(f(x)=(\prod^n_{i=1}x_i)^{1/n}\) is concave on \(\text{dom }f=\mathbb{R}^n_{++}\)
Log-det is concave on \(\text{dom }f=\mathbf{S}^n_{++}\)

Restriction of a convex function to a line

\(f:\mathbb{R}^n \rightarrow \mathbb{R}\) is convex iff the function \(g:\mathbb{R} \rightarrow \mathbb{R}\),

\[ g(t) = f(x+tv), \quad \text{dom } g = \{ t\; |\; x + tv \in \text{dom } f \} \]

is convex in \(t\) for any \(x \in \text{dom } f\), \(v \in \mathbf{R}^n\)

Upshot: can check convexity of \(f\) by checking convexity of functions of one variable.

Epigraph and sublevel set

\(\alpha\)-sublevel set of \(f:\mathbb{R}^n \rightarrow \mathbb{R}\), is defined as all x for which the value of \(f(x)\) is less then \(\alpha\)

\[ C_\alpha = \{ x \in \text{dom } f \; | \; f(x) \leq \alpha \} \]

Property: sublevel sets of convex functions are convex (converse is false)

epigraph of \(f:\mathbb{R}^n \rightarrow \mathbb{R}\):

\[ \text{epi } f = \{ (x,t) \in \mathbb{R}^{n+1} \; | \; x \in \text{dom } f, f(x) \leq t \} \]

Property: \(f\) is convex iff \(\text{epi }f\) is a convex set

Operations that perserve convexity

methods for establishing convexity of a function

verify definition (often simplified by restricting to a line)
for twice differentiable functions, show \(\nabla^2 f(x) \succeq 0\)
show that \(f\) is obtained from simple convex functions by operations that preserve convexity

nonnegative weighted sum
composition with affine function
pointwise maximum and supremum
composition
minimization
perspective

Notes on convex sets

Cones

Definition of a cone is quite general:

A set \(C\) is a cone if for every \(x\in C\) and \(\theta\ge 0\), we have \(\theta x \in C\). (Note that, a subspace is a cone since all (non-negative) combination of points in the subspace belongs to the subspace)

A set \(C\) is a convex cone if it is convex and a cone.

A proper cone on the other hand is more strict.

A cone \(K \subseteq \mathbb{R}^n\) is a proper cone if it satisfies the following

\(K\) is convex
\(K\) is closed
\(K\) is solid, which means it has nonempty interior.
\(K\) is pointed, which means that it contains no line.

Dual cones and generalized inequalities

Dual cones of a cone \(K\):
\[ K^* = \{y\; |\; y^T x\ge 0 \text{ for all } x \in K \} \]

Dual cones of proper cones are proper, so a dual cone defines generalize inequalities in the dual domain

\[ y \succeq_{K^*} 0 \quad \Longleftrightarrow \quad y^T x\ge 0 \text{ for all } x\succeq_K 0 \]

Characterization of minimum and minimal elements via dual inequalities

minimum element w.r.t. \( \preceq_K\)

Definition: \(x\) is minimum element of \(S\) iff for all \(\lambda \succeq_{K^*} 0\), \(x\) is the unique minimizer of \(\lambda^T z\) over \(S\)

minimal element w.r.t. \( \preceq_K \)

Definitions:

if \(x\) minimizes \(\lambda^T z\) over \(S\) for some \(\lambda \succ_{K^*} 0\), then \(x\) is minimal
if \(x\) is a minimal element of a convex set \(S\), then there exists a nonzero \(\lambda \succeq_{K^*} 0\) such that \(x\) minimizes \(\lambda^T z\) over \(S\)