Olivier Fercoq

AnR project: Primal-Dual Optimal Algorithms (APDO)

Team

Principal investigator: Olivier Fercoq
Preliminary works: internship of Oskar Rynkiewicz from April 2020 to September 2020
PhD students:
- Ololade Sowunmi from March 2021 to August 2021
- Iyad Walwil from April 2022
External experts
- Ion Necoara (Bucharest)
- Volkan Cevher (Lausanne)
- Adrien Taylor (Paris)
- Zheng Qu (Hong Kong)

Context

Optimising optimisation algorithms is a natural concern of the community. The problem is as follows: for a given function class, what algorithm has the best performance to minimise these functions? Conventionally, we try to bound the number of iterations required to get an approximated solution as a function of the desired accuracy. This provides a guarantee (called upper bound) on the worst case possible among the class. On the other hand, we are also interested in the minimum number of iterations needed to minimize the most recalcitrant functions of the class. This number is called a lower bound. When the upper and lower bounds are equal (or more generally of the same order of magnitude), we say that the algorithm found is optimal. This type of consideration was the basis of the research effort on the accelerated gradient method and on the method of interior points.

In this project, we are particularly interested in the problem of a optimising a convex function under affine constraints This problem is found in many fields (statistical learning, signal processing, shape optimization, operational research\ldots) and corresponds to the prototype of the problem requiring operator splitting. Proximal methods, such as the inexact augmented Lagrangian, the primal-dual hybrid gradient and the ADMM, are commonly used to solve such problems because they solve large problems in a reasonable time.

It has been shown that if we compute the average of the iterates that they produce, these methods reach the optimal convergence rate O(1 / k) on the class of convex functions f. This is remarkable but does not necessarily correlate with the user experience. In practice, we will instead favour the final iterate to the average of iterates and we will observe a linear convergence, ie an exponential decay of the measure of optimality. This gap between theory and practice happens because the user does not try to solve the most difficult problem among the convex problems under affine constraints. To obtain finer results, the class of functions studied must be restricted.

For example, if the Lagrangian is strongly convex-concave, then Chambolle and Pock proposed a primal-dual algorithm with linear convergence. Similarly, if f is quadratic, then the exact augmented Lagrangian method converges linearly. These hypotheses are restrictive: for example, the Lagrangian of a contrained optimization problem is never strongly convex-concave. Nevertheless, the metric sub-regularity of the generalized gradient of the Lagrangian is enough to have the linear convergence. This hypothesis covers the two previously cited cases. Moreover, in the case where there is no constraint, it is equivalent to the quadratic error bound property.

Main conjecture of the project

Let's start by describing the situation in the unconstrained case min_x f (x) with f convex, which is better understood. We say that f satisfies the quadratic error bound, if there exists µ > 0$ such that f (x) - min f > µ dist(x, argmin f)^2. Let us denote this function class F_µ$. The gradient method converges at the speed O ((1 - µ)^k) on F_µ and at the speed O(1 / k) on F_0. As it is the same algorithm whatever the value of µ, the gradient method is adaptive with respect to this parameter.

Moreover, the restarted accelerated gradient method converges at the speed O(( 1- µ^0.5)^k), which is optimal. The restart frequency depends on µ and for µ=0, we obtain the accelerated gradient method in O(1 / k^2), which is also optimal. At the cost of a logarithmic term, we can implement a method that automatically adjusts the restart frequency to $\mu$ and is at the same time optimal on all classes Fµ.

On the side of the convex optimization under affine constraints, if we have the metric sub-regularity with parameter m > 0, which means that the gradient G of the Lagrangian satisfies dist(0, G(z)) > m dist (z, G^{-1} (0)), the hybrid primal-dual gradient (PDHG) converges at the speed O((1- m^2)^k). If m=0, we have a convergence in O(1 / k^0.5). Here too, the algorithm is written in the same way regardless of the value of m. As explained previously, in the case m = 0, averaging the iterates of PDHG gives an optimal algorithm for m = 0 with the O (1 / k) complexity, but deteriorates the performances when m > 0.

The conjecture we would like to solve is based on an analogy with the case without constraint.

There exists an algorithm A_m that solves convex optimization with contraints at the speed O(1-m)^k under the assumption of metric sub-regularity with parameter m>0 of the generalised gradient of the Lagrangian.
This speed is optimal for this class of functions.
Moreover, the limiting algorithm A_0 has the speed O(1/k) for convex functions.

The main contribution is to replace m^2 with m in the convergence rate. This corresponds paradigm shift in the speed of resolution of the problem and therefore a major advance in the field.

Publications

arXiv:2403.19202 Monitoring the Convergence Speed of PDHG to Find Better Primal and Dual Step Sizes. O. Fercoq, 2024

arXiv:2403.12579 The Smoothed Duality Gap as a Stopping Criterion. I. Walwil and O. Fercoq, 2024

journal paper Quadratic error bound of the smoothed gap and the restarted averaged primal-dual hybrid gradient. O. Fercoq, Open Journal of Mathematical Optimization 4(6), 2023.

conference paper Solving stochastic weak Minty variational inequalities without increasing batch size. T. Pethick, O. Fercoq, P. Latafat, P. Patrinos and V. Cevher, Proceedings of the International Conference on Learning Representations, 2023.

Design by Marion Chagne-Fercoq