Fundametals of Dual Method#
This part of the notes will discuss dual methods, e.g. dual (sub)gradient ascent, augmented Lagrangian (AL), and alternating direction method of multipliers (ADMM). This note will first present some fundametal properties of dual methods, and then show how dual (sub)gradient ascent works along with its distributed version. For AL and ADMM, we will discuss about them in later notes.
Motivation#
As the name suggests, dual methods attempt to solve the primal problem by solving the dual problem. If we take a convex optimization problem with equality constraints as example
We can have its Lagrangian as
and the Lagrangian dual function as
Define the conjugate function of \(f(x)\) as
Conjugate Function
Given \(f:\mathbb{R}^n \rightarrow \mathbb{R}\), the conjugate function of \(f\) is define as
also it can be defined as posing it as a minimization problem
Then we can we write (2) using conjugate functions:
This gives us a succinct way of describing the Lagrangian dual function. The dual methods are motivated by situations where the conjugate function cannot be obtained in close form. In those cases, we can still solve the optimization problem using dual-based subgradient or gradient methods.
Properties of Conjugate Functions#
There are four main properties of conjugate functions that are useful to our analysis of dual methods:
Property 1
If \(f\) is closed and convex then \(f^{**} = f\).
Property 2
The following statements are equivalent
\(x \in \partial f^*(y)\);
\(y \in \partial f(x)\);
\(x \in \underset{z}{\mathrm{argmin}}(f(z) - y^Tz)\).
Proof of Property 2
Part 1.1: We show that \(y \in \partial f(x) \Rightarrow x \in \partial f^*(y)\)
From the definition of conjugate functions we know
Thus, using the subgradient optimality condition, for any \(z\) that is optimal it must satisfy
Also for any \(z\) that satisfies \(y\in\partial f(z)\) it must be optimal. Since \(y \in \partial f(x)\), then \(z = x\) must minimize \(f(z) - z^Ty\) over \(z\). Define
Now we need to acknowledging one of the properties of general pointwise maximums: if \(f(x) = \max_{s\in\mathcal{S}}f_s(x)\), then
Note that \(\partial h_z(y) = z\). Then, by the definition of general pointwise maximums, we have
Hence, we have \(x \in \partial f^*(y)\).
Part 1.2 We show that \(y \in \partial f(x) \Leftarrow x \in \partial f^*(y)\)
From the above derivation we know that \(y \in \partial f(x) \Rightarrow x \in \partial f^*(y)\). Then, we can conclude that \(x \in \partial f^*(y) \Rightarrow y \in \partial f^{**}(x)\), given that \(f = f^{**}\), we have \(x \in \partial f^*(y) \Rightarrow y \in \partial f(x)\).
Part 2.1 We show that \(\displaystyle y \in \partial f(x) \Rightarrow x \in \underset{z}{\mathrm{argmin}}(f(z) - y^Tz)\)
The subgradient optimality condition for \(f(z) - y^Tz\) is \(0 \in \partial_z(f(z) - y^Tz)\), which can be written as \(y \in \partial_z f(z)\). Since \(y \in \partial f(x)\), then \(x\) must minimize \(f(z) - y^Tz\).
Part 2.2 We show that \(\displaystyle y \in \partial f(x) \Leftarrow x \in \underset{z}{\mathrm{argmin}}(f(z) - y^Tz)\)
If \(x\) minimizes \(f(z) - y^Tz\), then from the subgradient optimality condition we have \(0 \in \partial f(x) - y\), i.e. \(y \in \partial f(x)\).
Thus, we can see all three statements are equivalent.
Property 3
Assuming that \(f\) is closed and convex. Then, the following two statements are equivalent
\(f\) is strongly convex with parameter \(d\);
\(\nabla f^*\) is Lipschitz with parameter \(1/d\).
Proof of Property 3
If we have a strongly convex function \(g\) with parameter \(d\), then it has the property
if \(x\) is a minimizer, i.e. \(\nabla g(x) = 0\), then we have
If we define two functions
then we have (from property 4)
This leads to
if we add them up we have \(g_1(x_v) + g_2(x_u) \geq g_1(x_u) + g_2(x_v) + d\|x_v - x_u\|_2^2\), which is the same as
which can be cleaned up and get
From Cauchy-Schwartz, we then can get
Property 4
If \(f\) is strictly convex, then \(\nabla f^*(y) = \underset{z}{\mathrm{argmin}}(f(z) - y^Tz)\).
Proof of Property 4
Since \(f\) is strictly convex, then \(f(z) - y^Tz\) would also be strictly convex over \(z\). Then, for the strictly convex function \(f(z) - y^Tz\) we would have a unique minimizer. From property 2 we know that this unique minimizer belongs to the set of \(\partial f^*(y)\). And given the fact that for strictly convex functions its conjugate function is differentiable (see here) we know that this unique minimizer would be \(\nabla f^*(y)\).
Dual (Sub)Gradient Methods#
From (4), we know that we can write the Lagrangian dual function using conjugate functions. Then, we have the Lagrangian dual problem as
We can have the subgradient of \(g\) as
From property 2, we have the relationship
Then, we can write the subgradient of \(g\) as
The steps for the dual subgradient method for maximizing the dual objective is as follows
Dual Subgradient Method
Start for an initial dual guess \(u^{(0)}\), and repeats for \(k = 1, 2, 3, \cdots\)
\(x^{(k)} \in \underset{x}{\mathrm{argmin}}\Big[f(x) + (u^{(k-1)})^TAx\Big]\)
\(u^{(k)} = u^{(k-1)} + t_k(Ax^{(k)} - b)\)
where the step sizes \(t_k\) are chosen using backtracing line search.
Dual Decomposition#
The problem in consideration for dual decomposition is
which can also be written as
where \(x = [x_1, \cdots, x_B] \in \mathbb{R}^n\) with \(x_i \in \mathbb{R}^{n_i}\) and \(A = [A_1, \cdots, A_B]\) with \(A_i \in \mathbb{R}^{m \times n_i}\). The first step of the dual subgradient method becomes
we can also write is as
Then, we have the dual decomposition algorithm as
Dual Decomposition Method with Equality Constraints
Start for an initial dual guess \(u^{(0)}\), and repeats for \(k = 1, 2, 3, \cdots\)
\(x_i^{(k)} \in \underset{x_i}{\mathrm{argmin}}\Big[f_i(x_i) + (u^{(k-1)})^TA_ix_i\Big],\ i = 1, \cdots, B\)
\(\displaystyle u^{(k)} = u^{(k-1)} + t_k(\sum_{i=1}^{B}{A_ix_i^{(k)}} - b)\)
where the step sizes \(t_k\) are chosen using backtracing line search.
For the problem with inequality constraints
we have the dual decomposition algorithm as
Dual Decomposition Method with Inequality Constraints
Start for an initial dual guess \(u^{(0)}\), and repeats for \(k = 1, 2, 3, \cdots\)
\(x_i^{(k)} \in \underset{x_i}{\mathrm{argmin}}\Big[f_i(x_i) + (u^{(k-1)})^TA_ix_i\Big],\ i = 1, \cdots, B\)
\(\displaystyle u^{(k)} = \Big[u^{(k-1)} + t_k(\sum_{i=1}^{B}{A_ix_i^{(k)}} - b)\Big]_+\)
where the step sizes \(t_k\) are chosen using backtracing line search and \((u_+)_i = \max\{0, u_i\}.\)
Convergence Guarantees#
When the objective function \(f\) is strongly convex with parameter \(d\), the dual gradient ascent with \(t_k = d\) converges at the rate of \(\mathcal{O}(1/\epsilon)\). Additionally, if \(\nabla f\) is also Lipschitz with parameter \(L\), then \(t_k = 2 / (1/d + 1/L)\) converges at the rate of \(\mathcal{O}(\log(1/\epsilon))\).