ICCV 2011: Generalized Roof Duality

Fredrik Kahl and I have a paper accepted to ICCV titled Generalized Roof Duality for Pseudo-Boolean Optimization.

Just as in the previous post, the problem of interest is to minimize degree-{m} polynomials {f: \mathbf{B}^n \rightarrow \bf \mathbf{R}} of the form:

\displaystyle  	f(\boldsymbol{x}) = \sum_i a_ix_i + \sum_{i<j} a_{ij}x_ix_j + \sum_{i<j<k} a_{ijk}x_ix_jx_k + \ldots.

The previous post showed that many different reductions to the quadratic case are possible. Our paper describes a generalization of roof duality which does not rely on reductions and is in a suitable sense optimal.

1. Submodular Relaxations

Minimizing {f} is not a tractable problem in general. By enlarging the domain, we will relax the problem and look at the following tractable problem:

\displaystyle   \min_{(\boldsymbol{x},\boldsymbol{y}) \in \mathbf{B}^{2n}} g(\boldsymbol{x},\boldsymbol{y}),

where {g: \mathbf{B}^{2n} \rightarrow \bf \mathbf{R}} is a function that satisfies

\displaystyle   g(\boldsymbol{x},\boldsymbol{\hat{x}})=f(\boldsymbol{x}),\quad \forall \boldsymbol{x} \in \mathbf{B}^n, \ \ \ \ \ (1)

\displaystyle   g \text{ is submodular}, \ \ \ \ \ (2)

\displaystyle   g(\boldsymbol{x},\boldsymbol{y}) = g(\boldsymbol{\hat{y}},\boldsymbol{\hat{x}}). \ \ \ \ \ (3)

The reason for requirement (1) is that we want the minimum of {g} to be a lower bound to the minimum of {f}. By minimizing {g} instead of {f}, some information of the minimum of {f} is still obtained. Since {g(\boldsymbol{x},\boldsymbol{\hat{x}})=f(\boldsymbol{x})} for all {\boldsymbol{x}}, it is certainly true that the minimum of {g} is a lower bound.

The requirement (2) is also fairly obvious. Since we must be able to minimize {g}, requiring that {g} is submodular is natural.

The last requirement, (3), is not obvious at all. It is included for two reasons:

  • One can prove that restricting ourselves to this class of symmetric functions does not affect the obtained lower bound, i.e. the optimal relaxation is symmetric.
  • For a symmetric {g} we can prove persistency, an extremely useful property.

Let {f_\text{min}} denote the unknown minimum value of {f}, that is, {f_\text{min}=\min f(\boldsymbol{x})}. Ideally, we would like {g(\boldsymbol{x},\boldsymbol{y})\ge f_\text{min}} for all points {(\boldsymbol{x},\boldsymbol{y})\in\mathbf{B}^{2n}}. This is evidently not possible in general. However, one could try to maximize the lower bound of {g}, {\max \min_{\boldsymbol{x},\boldsymbol{y}} g(\boldsymbol{x},\boldsymbol{y})}, hence,

\displaystyle   \max_g \quad \ell \ \ \ \ \ (4)

\displaystyle  \text{such that }\quad g(\boldsymbol{x},\boldsymbol{y}) \ge \ell,\quad \forall(\boldsymbol{x},\boldsymbol{y})\in\mathbf{B}^{2n}

\displaystyle  \, g \text{ satisfies (1)--(3)}.

A relaxation {g} that provides the maximum lower bound will be called optimal. We prove in the paper that when {m=2}, the lower bound coincides with the roof duality bound and therefore this maximum lower bound will be referred to as generalized roof duality.

2. Persistency

A minimizer to the relaxation will have the persistency property. Let {(\boldsymbol{x}^*,\boldsymbol{y}^*)} be a minimizer of {g}. Further, let {I} be the set of indices {i} satisfying {x_i^* = 1-y_i^*}. Then there exists a minimizer {\boldsymbol{x}} to the original problem such that {x_i = x_i^*} for all {i\in I}.

Actually, persistency holds for all relaxations {g} satisfying (1)(3), not just the optimal one. This is a central point in the paper when an efficient algorithm is derived.

3. Efficient formulation

For a pseudo-boolean function {f} in {n} variables, solving the optimization problem (4) is not tractable since the required number of constraints is exponential in {n}. Two obvious heuristic alternatives are:

  1. Decompose {f} into a sum of the form {f(\boldsymbol{x})=\sum_{i<j<k} f_{ijk}(x_i,x_j,x_k)} and compute an optimal relaxation for each term {f_{ijk}}. However, the sum of optimal relaxations is generally not optimal.
  2. Use a subset of the points in {\mathbf{S}^n} for {g(\boldsymbol{x},\boldsymbol{y})\ge \ell} to get an approximate optimal relaxation.

Neither of these approaches are satisfactory. One may even wonder if the optimal relaxation {g} is polynomial time computable at all?

Persistency holds for any bisubmodular relaxation {g} — optimal or not. Instead of solving (4), which, although possible, may require a large number of constraints, one can consider a simpler problem:

\displaystyle  \max_g \quad g(\boldsymbol{0},\boldsymbol{0}) \ \ \ \ \ (5)

\displaystyle  \mbox{such that }\quad g \text{ satisfies (1)--(3)}.

Instead of maximizing {\min g(\boldsymbol{x},\boldsymbol{y})}, we are only maximizing {g(\boldsymbol{0},\boldsymbol{0})}. This problem is considerably less arduous and can be solved in polynomial time (provided we can determine submodularity with a polynomial number of constraints). Given the minimizer {(\boldsymbol{x}^*,\boldsymbol{y}^*) \in \mathrm{argmin}(g)}, which is also polynomial time computable via graph cuts, we can make the following important observations:

  • If {(\boldsymbol{x}^*,\boldsymbol{y}^*)} is non-zero, then we can use persistency to get a partial solution and reduce the number of variables in {f}.
  • Otherwise, as the optimum is indeed the trivial solution, and as {g(\boldsymbol{0},\boldsymbol{0})} is maximized in the construction of {g}, then we can conclude that {g} is an optimal relaxation and we have obtained the generalized roof duality bound.

These observations lead to the following algorithm that computes the generalized roof duality bound:

  1. Construct {g} by solving (5).
  2. Compute {(\boldsymbol{x}^*,\boldsymbol{y}^*) \in \mathrm{argmin}(g)}.
  3. If {(\boldsymbol{x}^*,\boldsymbol{y}^*)} is non-zero, use persistency to simplify {f} and start over from 1. Otherwise, stop.

Theorem 1 A solution that attains the generalized roof duality bound for a pseudo-boolean function can be computed in polynomial time.

Proof: If {(\boldsymbol{x}^*,\boldsymbol{y}^*)} is non-zero, persistency can be used to simplify the original function {f}. This is equivalent to adding constraints of the type {x_i = c} to {\max_g \min_{\boldsymbol{x},\boldsymbol{y}} g(\boldsymbol{x},\boldsymbol{y})}. This can only increase the optimal value. If on the other hand {(\boldsymbol{x}^*,\boldsymbol{y}^*)=(\boldsymbol{0},\boldsymbol{0})}, the current optimization was solved. Therefore, the final bound is at least equal to the optimal value of (4). \Box

Remark. As suggested by the proof, the iterative approach can obtain better bounds than the “optimal” value defined by (4). We have observed this in practice for small problems where solving (4) is feasible. One example is

\displaystyle f(\boldsymbol{x}) = 14x_1 + 15x_2 - 6x_3 + 9x_4

\displaystyle - 5x_1x_2 + 6x_1x_3 + 3x_1x_4 + 13x_2x_3 + 13x_2x_4 - 6x_3x_4

\displaystyle  + 20x_1x_2x_3 + 9x_1x_2x_4 + 17x_1x_3x_4 + 2x_2x_3x_4, 	\ \ \ \ \ (6)

for which (4) gives the lower bound {-8} and the iterative method above gives {-6}, which is optimal.

4. Full paper and source code

The paper can be downloaded from our web page:

The source code will be made available in the next month or so.

This post will be followed up by a post containing the experimental results.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s