Fundametals of Dual Method#

This part of the notes will discuss dual methods, e.g. dual (sub)gradient ascent, augmented Lagrangian (AL), and alternating direction method of multipliers (ADMM). This note will first present some fundametal properties of dual methods, and then show how dual (sub)gradient ascent works along with its distributed version. For AL and ADMM, we will discuss about them in later notes.

Motivation#

As the name suggests, dual methods attempt to solve the primal problem by solving the dual problem. If we take a convex optimization problem with equality constraints as example

(1)#\[\begin{split} \begin{align} \min_x\ &\ f(x)\\ \mathrm{subject\ to}\ &\ Ax = b. \end{align} \end{split}\]

We can have its Lagrangian as

\[ L(x, u) = f(x) + u^T(Ax - b) \]

and the Lagrangian dual function as

(2)#\[ g(u) = \min_x\Big(f(x) - (-u^TA)x - u^Tb\Big). \]

Define the conjugate function of \(f(x)\) as

Conjugate Function

Given \(f:\mathbb{R}^n \rightarrow \mathbb{R}\), the conjugate function of \(f\) is define as

(3)#\[ f^*(y) = \max_{x}\Big(y^Tx - f(x)\Big) \]

also it can be defined as posing it as a minimization problem

\[ -f^*(y) = \min_{x}\Big(f(x) - y^Tx\Big). \]

Then we can we write (2) using conjugate functions:

(4)#\[\begin{split} \begin{align} g(u) &= \min_x\Big(f(x) - (-u^TA)x\Big) - u^Tb\\ &= -f^*(-A^Tu) - u^Tb. \end{align} \end{split}\]

This gives us a succinct way of describing the Lagrangian dual function. The dual methods are motivated by situations where the conjugate function cannot be obtained in close form. In those cases, we can still solve the optimization problem using dual-based subgradient or gradient methods.

Properties of Conjugate Functions#

There are four main properties of conjugate functions that are useful to our analysis of dual methods:

Property 1

If \(f\) is closed and convex then \(f^{**} = f\).

Property 2

The following statements are equivalent

\(x \in \partial f^*(y)\);
\(y \in \partial f(x)\);
\(x \in \underset{z}{\mathrm{argmin}}(f(z) - y^Tz)\).

Property 3

Assuming that \(f\) is closed and convex. Then, the following two statements are equivalent

\(f\) is strongly convex with parameter \(d\);
\(\nabla f^*\) is Lipschitz with parameter \(1/d\).

Property 4

If \(f\) is strictly convex, then \(\nabla f^*(y) = \underset{z}{\mathrm{argmin}}(f(z) - y^Tz)\).

Dual (Sub)Gradient Methods#

From (4), we know that we can write the Lagrangian dual function using conjugate functions. Then, we have the Lagrangian dual problem as

\[ \max_{u}\ g(u) = \max_{u}\Big[-f^*(-A^Tu) - u^Tb\Big]. \]

We can have the subgradient of \(g\) as

\[ \partial g(u) = A\partial f^*(-A^Tu) - b. \]

From property 2, we have the relationship

\[ x\in\partial f^*(-A^Tu) \iff x\in\underset{z}{\mathrm{argmin}}\Big[f(z) + u^TAz\Big]. \]

Then, we can write the subgradient of \(g\) as

\[ \partial g(u) = Ax - b\ \mathrm{where}\ x\in\underset{z}{\mathrm{argmin}}\Big[f(z) + u^TAz\Big]. \]

The steps for the dual subgradient method for maximizing the dual objective is as follows

Dual Subgradient Method

Start for an initial dual guess \(u^{(0)}\), and repeats for \(k = 1, 2, 3, \cdots\)

\(x^{(k)} \in \underset{x}{\mathrm{argmin}}\Big[f(x) + (u^{(k-1)})^TAx\Big]\)
\(u^{(k)} = u^{(k-1)} + t_k(Ax^{(k)} - b)\)

where the step sizes \(t_k\) are chosen using backtracing line search.

Dual Decomposition#

The problem in consideration for dual decomposition is

\[\begin{split} \begin{align} \min_x\ &\ \sum_{i=1}^{B}{f_i(x_i)}\\ \mathrm{subject\ to}\ &\ Ax = b \end{align} \end{split}\]

which can also be written as

\[\begin{split} \begin{align} \min_x\ &\ \sum_{i=1}^{B}{f_i(x_i)}\\ \mathrm{subject\ to}\ &\ \sum_{i=1}^{B}{A_ix_i}= b \end{align} \end{split}\]

where \(x = [x_1, \cdots, x_B] \in \mathbb{R}^n\) with \(x_i \in \mathbb{R}^{n_i}\) and \(A = [A_1, \cdots, A_B]\) with \(A_i \in \mathbb{R}^{m \times n_i}\). The first step of the dual subgradient method becomes

\[ x^{(k)} \in \underset{x}{\mathrm{argmin}}\Big[\sum_{i=1}^{B}{f_i(x_i)} + (u^{(k-1)})^T\sum_{i=1}^{B}{A_ix_i}\Big] \]

we can also write is as

\[ x_i^{(k)} \in \underset{x_i}{\mathrm{argmin}}\Big[f_i(x_i) + (u^{(k-1)})^TA_ix_i\Big],\ i = 1, \cdots, B. \]

Then, we have the dual decomposition algorithm as

Dual Decomposition Method with Equality Constraints

Start for an initial dual guess \(u^{(0)}\), and repeats for \(k = 1, 2, 3, \cdots\)

\(x_i^{(k)} \in \underset{x_i}{\mathrm{argmin}}\Big[f_i(x_i) + (u^{(k-1)})^TA_ix_i\Big],\ i = 1, \cdots, B\)
\(\displaystyle u^{(k)} = u^{(k-1)} + t_k(\sum_{i=1}^{B}{A_ix_i^{(k)}} - b)\)

where the step sizes \(t_k\) are chosen using backtracing line search.

For the problem with inequality constraints

\[\begin{split} \begin{align} \min_x\ &\ \sum_{i=1}^{B}{f_i(x_i)}\\ \mathrm{subject\ to}\ &\ \sum_{i=1}^{B}{A_ix_i} \leq b \end{align} \end{split}\]

we have the dual decomposition algorithm as

Dual Decomposition Method with Inequality Constraints

Start for an initial dual guess \(u^{(0)}\), and repeats for \(k = 1, 2, 3, \cdots\)

\(x_i^{(k)} \in \underset{x_i}{\mathrm{argmin}}\Big[f_i(x_i) + (u^{(k-1)})^TA_ix_i\Big],\ i = 1, \cdots, B\)
\(\displaystyle u^{(k)} = \Big[u^{(k-1)} + t_k(\sum_{i=1}^{B}{A_ix_i^{(k)}} - b)\Big]_+\)

where the step sizes \(t_k\) are chosen using backtracing line search and \((u_+)_i = \max\{0, u_i\}.\)

Convergence Guarantees#

When the objective function \(f\) is strongly convex with parameter \(d\), the dual gradient ascent with \(t_k = d\) converges at the rate of \(\mathcal{O}(1/\epsilon)\). Additionally, if \(\nabla f\) is also Lipschitz with parameter \(L\), then \(t_k = 2 / (1/d + 1/L)\) converges at the rate of \(\mathcal{O}(\log(1/\epsilon))\).

Fundametals of Dual Method

Contents

Fundametals of Dual Method#

Motivation#

Properties of Conjugate Functions#

Dual (Sub)Gradient Methods#

Dual Decomposition#

Convergence Guarantees#