Optimization#
This package contains a collection of routines that optimize
where the first term \(\datafidname:\xset\times\yset \mapsto \mathbb{R}_{+}\) enforces data-fidelity, the second term \(\regname:\xset\mapsto \mathbb{R}_{+}\) acts as a regularization and \(\lambda > 0\) is a regularization parameter. More precisely, the data-fidelity term penalizes the discrepancy between the data \(y\) and the forward operator \(A\) applied to the variable \(x\), as
where \(\distance{\cdot}{\cdot}\) is a distance function, and where \(A:\xset\mapsto \yset\) is the forward
operator (see deepinv.physics.Physics
)
Note
The regularization term often (but not always) depends on a hyperparameter \(\sigma\) that can be either fixed or estimated. For example, if the regularization is implicitly defined by a denoiser, the hyperparameter is the noise level.
A typical example of optimization problem is the \(\ell_1\)-regularized least squares problem, where the data-fidelity term is the squared \(\ell_2\)-norm and the regularization term is the \(\ell_1\)-norm. In this case, a possible algorithm to solve the problem is the Proximal Gradient Descent (PGD) algorithm writing as
where \(\operatorname{prox}_{\lambda \regname}\) is the proximity operator of the regularization term, \(\gamma\) is the step size of the algorithm, and \(\nabla \datafidname\) is the gradient of the data-fidelity term.
The following example illustrates the implementation of the PGD algorithm with DeepInverse to solve the \(\ell_1\)-regularized least squares problem.
>>> import torch
>>> import deepinv as dinv
>>> from deepinv.optim import L2, TVPrior
>>>
>>> # Forward operator, here inpainting
>>> mask = torch.ones((1, 2, 2))
>>> mask[0, 0, 0] = 0
>>> physics = dinv.physics.Inpainting(tensor_size=mask.shape, mask=mask)
>>> # Generate data
>>> x = torch.ones((1, 1, 2, 2))
>>> y = physics(x)
>>> data_fidelity = L2() # The data fidelity term
>>> prior = TVPrior() # The prior term
>>> lambd = 0.1 # Regularization parameter
>>> # Compute the squared norm of the operator A
>>> norm_A2 = physics.compute_norm(y, tol=1e-4, verbose=False).item()
>>> stepsize = 1/norm_A2 # stepsize for the PGD algorithm
>>>
>>> # PGD algorithm
>>> max_iter = 20 # number of iterations
>>> x_k = torch.zeros_like(x) # initial guess
>>>
>>> for it in range(max_iter):
... u = x_k - stepsize*data_fidelity.grad(x_k, y, physics) # Gradient step
... x_k = prior.prox(u, gamma=lambd*stepsize) # Proximal step
... cost = data_fidelity(x_k, y, physics) + lambd*prior(x_k) # Compute the cost
...
>>> print(cost < 1e-5)
tensor([True])
>>> print('Estimated solution: ', x_k.flatten())
Estimated solution: tensor([1.0000, 1.0000, 1.0000, 1.0000])
Potentials#
The class deepinv.optim.Potential
implements potential scalar functions \(h : \xset \to \mathbb{R}\)
used to define an optimization problems. For example, both \(f\) and \(\regname\) are potentials.
This class comes with methods for computing operators useful for optimization,
such as its proximal operator \(\operatorname{prox}_{h}\), its gradient \(\nabla h\), its convex conjugate \(h^*\), etc.
The following classes inherit from deepinv.optim.Potential
Class |
\(h(x)\) |
Requires |
\(\phi(x)\) with \(\phi\) convex |
None |
|
\(d(x,y)\) |
\(y\) |
|
\(d(A(x),y)\) where \(d\) is a distance. |
\(y\) & operator \(A\) |
|
\(g_{\sigma}(x)\) |
optional denoising level \(\sigma\) |
Bregman#
Bregman potentials are defined as \(\phi(x)\) where \(x\in\xset\) is a variable and
where \(\phi\) is a convex scalar function, and are defined via the base class deepinv.optim.Bregman
.
In addition to the methods inherited from deepinv.optim.Potential
(gradient
\(\nabla \phi\), conjugate \(\phi^*\) and its gradient \(\nabla \phi^*\)),
this class provides the Bregman divergence \(D(x,y) = \phi(x) - \phi^*(y) - x^{\top} y\),
and is well suited for performing Mirror Descent.
Class |
Bregman potential \(\phi(x)\) |
---|---|
\(\|x\|_2^2\) |
|
\(- \sum_i \log x_i\) |
|
\(\sum_i x_i \log x_i\) |
|
Data Fidelity#
The base class deepinv.optim.DataFidelity
implements data fidelity terms \(\distance{A(x)}{y}\)
where \(A\) is the forward operator, \(x\in\xset\) is a variable and \(y\in\yset\) is the data,
and where \(d\) is a distance function from the class deepinv.optim.Distance
.
The class deepinv.optim.Distance
is implemented as a child class from deepinv.optim.Potential
.
This data-fidelity class thus comes with useful methods, such as \(\operatorname{prox}_{\distancename\circ A}\) and \(\nabla (\distancename \circ A)\) (among others) which are used by most optimization algorithms.
Data Fidelity |
\(d(A(x), y)\) |
---|---|
\(\|A(x) - y\|_1\) |
|
\(\|A(x) - y\|_2^2\) |
|
Indicator function of \(\|A(x) - y\|_2 \leq \epsilon\) |
|
\(\datafid{A(x)}{y} = -y^{\top} \log(A(x)+\beta)+1^{\top}A(x)\) |
|
\(N_0 (1^{\top} \exp(-\mu A(x))+ \mu \exp(-\mu y)^{\top}A(x))\) |
|
\(\sum_{i=1}^{m}{(\sqrt{|b_i^{\top} x|^2}-\sqrt{y_i})^2}\) |
Priors#
Prior functions are defined as \(\reg{x}\) where \(x\in\xset\) is a variable and where \(\regname\) is a function.
The base class is deepinv.optim.Prior
implemented as a child class from deepinv.optim.Potential
and therefore it comes with methods for computing operators such as \(\operatorname{prox}_{\regname}\) and \(\nabla \regname\). This base class is used to implement user-defined differentiable
priors (eg. Tikhonov regularization) but also implicit priors (eg. plug-and-play methods).
Prior |
\(\reg{x}\) |
Explicit \(\regname\) |
---|---|---|
\(\operatorname{prox}_{\gamma \regname}(x) = \operatorname{D}_{\sigma}(x)\) |
No |
|
\(\nabla \reg{x} = x - \operatorname{D}_{\sigma}(x)\) |
No |
|
\(\nabla \reg{x}=\left(x-\operatorname{D}_{\sigma}(x)\right)/\sigma^2\) |
No |
|
\(\reg{x}=\|x\|_2^2\) |
Yes |
|
\(\reg{x}=\|x\|_1\) |
Yes |
|
\(\reg{x} = \|\Psi x\|_{p}\) where \(\Psi\) is a wavelet transform |
Yes |
|
\(\reg{x}=\|Dx\|_{1,2}\) where \(D\) is a finite difference operator |
Yes |
|
\(\reg{x} = \sum_i h(P_i x)\) for some prior \(h(x)\) on the space of patches |
Yes |
|
Patch prior via normalizing flows. |
Yes |
|
\(\reg{x} = \sum_i\| x_i \|_2\) |
Yes |
Predefined Algorithms#
Optimization algorithm inherit from the base class deepinv.optim.BaseOptim
, which serves as a common interface
for all predefined optimization algorithms.
The function deepinv.optim.optim_builder()
returns an instance of deepinv.optim.BaseOptim
with the
optimization algorithm of choice, either a predefined one ("PGD"
, "ADMM"
, "HQS"
, etc.),
or with a user-defined one. For example, we can create the same proximal gradient algorithm as the one
at the beginning of this page, in one line of code:
>>> model = dinv.optim.optim_builder(iteration="PGD", prior=prior, data_fidelity=data_fidelity,
... params_algo={"stepsize": stepsize, "lambda": lambd}, max_iter=max_iter)
>>> x_hat = model(y, physics)
>>> dinv.utils.plot([x, y, x_hat], ["signal", "measurement", "estimate"], rescale_mode='clip')
Some predefined optimizers are provided:
Algorithm |
Iteration |
Parameters |
---|---|---|
\(v_{k} = \nabla f(x_k) + \lambda \nabla \reg{x_k}\)
\(x_{k+1} = x_k-\gamma v_{k}\)
|
|
|
\(u_{k} = x_k - \gamma \nabla f(x_k)\)
\(x_{k+1} = \operatorname{prox}_{\gamma \lambda \regname}(u_k)\)
|
|
|
\(u_{k} = z_k - \gamma \nabla f(z_k)\)
\(x_{k+1} = \operatorname{prox}_{\gamma \lambda \regname}(u_k)\)
\(z_{k+1} = x_{k+1} + \alpha_k (x_{k+1} - x_k)\)
|
|
|
\(u_{k} = \operatorname{prox}_{\gamma f}(x_k)\)
\(x_{k+1} = \operatorname{prox}_{\sigma \lambda \regname}(u_k)\)
|
|
|
\(u_{k+1} = \operatorname{prox}_{\gamma f}(x_k - z_k)\)
\(x_{k+1} = \operatorname{prox}_{\gamma \lambda \regname}(u_{k+1} + z_k)\)
\(z_{k+1} = z_k + \beta (u_{k+1} - x_{k+1})\)
|
|
|
\(u_{k+1} = \operatorname{prox}_{\gamma f}(z_k)\)
\(x_{k+1} = \operatorname{prox}_{\gamma \lambda \regname}(2*u_{k+1}-z_k)\)
\(z_{k+1} = z_k + \beta (x_{k+1} - u_{k+1})\)
|
|
|
\(u_{k+1} = \operatorname{prox}_{\sigma F^*}(u_k + \sigma K z_k)\)
\(x_{k+1} = \operatorname{prox}_{\tau \lambda G}(x_k-\tau K^\top u_{k+1})\)
\(z_{k+1} = x_{k+1} + \beta(x_{k+1}-x_k)\)
|
|
|
\(v_{k} = \nabla f(x_k) + \lambda \nabla \reg{x_k}\)
\(x_{k+1} = \nabla h^*(\nabla h(x_k) - \gamma v_{k})\)
|
|
|
\(M = \conj{B} \text{diag}(T(y)) B + \lambda I\) |
(phase-retrieval only) |
Parameters#
The parameters of generic optimization algorithms, such as
stepsize, regularization parameter, standard deviation of denoiser prior, etc.
are stored in a dictionary "params_algo"
, whose typical entries are:
Key |
Meaning |
Recommended Values |
---|---|---|
|
Step size of the optimization algorithm. |
Should be positive. Depending on the algorithm,
needs to be small enough for convergence;
e.g. for PGD with
g_first=False ,should be smaller than \(1/(\|A\|_2^2)\).
|
|
Regularization parameter \(\lambda\)
multiplying the regularization term.
|
Should be positive. |
|
Optional parameter \(\sigma\) which \(\regname\) depends on.
For priors based on denoisers,
corresponds to the noise level.
|
Should be positive. |
|
Relaxation parameter used in
ADMM, DRS, CP.
|
Should be positive. |
|
Step size in the dual update in the
Primal Dual algorithm (only required by CP).
|
Should be positive. |
Each value of the dictionary can be either an iterable (i.e., a list with a distinct value for each iteration) or a single float (same value for each iteration).
Iterators#
An optim iterator is an object that implements a fixed point iteration for minimizing the sum of two functions \(F = \datafidname + \lambda \regname\) where \(\datafidname\) is a data-fidelity term that will be modeled by an instance of physics and \(\regname\) is a regularizer. The fixed point iteration takes the form
where \(x\) is a variable converging to the solution of the minimization problem, and \(z\) is an additional variable that may be required in the computation of the fixed point operator.
The implementation of the fixed point algorithm in deepinv.optim
,
following standard optimization theory, is split in two steps:
where \(\operatorname{step}_{\datafidname}\) and \(\operatorname{step}_{\regname}\) are gradient and/or proximal steps on \(\datafidname\) and \(\regname\), while using additional inputs, such as \(A\) and \(y\), but also stepsizes, relaxation parameters, etc…
The deepinv.optim.optim_iterators.fStep
and deepinv.optim.optim_iterators.gStep
classes
precisely implement these steps.
Utils#
We provide some useful routines for optimization algorithms.
deepinv.optim.utils.conjugate_gradient
implements the conjugate gradient algorithm for solving linear systems.deepinv.optim.utils.gradient_descent
implements the gradient descent algorithm.