Diffusion and MCMC Algorithms#

This package contains posterior sampling algorithms, based on diffusion models and Markov Chain Monte Carlo (MCMC) methods.

These methods build a Markov chain

\[x_{t+1} \sim p(x_{t+1} | x_t, y)\]

such that the samples \(x_t\) for large \(t\) are approximately sampled according to the posterior distribution \(p(x|y)\).

Diffusion models#

We provide a unified framework for image generation using diffusion models. Diffusion models for posterior sampling are defined using deepinv.sampling.PosteriorDiffusion, which is a subclass of deepinv.models.base.Reconstructor. Below, we explain the main components of the diffusion models, see Building your diffusion posterior sampling method using SDEs for an example usage and visualizations.

Stochastic Differential Equations#

We define diffusion models as Stochastic Differential Equations (SDEs).

The forward-time SDE is defined as follows, from time \(0\) to \(T\):

\[d x_t = f(t) x_t dt + g(t) d w_t.\]

where \(w_t\) is a Brownian process. Let \(p_t\) denote the distribution of the random vector \(x_t\). Under this forward process, we have that:

\[x_t \vert x_0 \sim \mathcal{N} \left( s(t) x_0, \frac{\sigma(t)^2}{s(t)^2} \mathrm{I} \right),\]

where the scaling over time is \(s(t) = \exp\left( \int_0^t f(r) d\,r \right)\) and the normalized noise level is \(\sigma(t) = \sqrt{\int_0^t \frac{g(r)^2}{s(r)^2} d\,r}\).

The reverse-time SDE is defined as follows, running backwards in time (from \(T\) to \(0\)):

\[d x_{t} = \left( f(t) x_t - \frac{1 + \alpha}{2} g(t)^2 \nabla \log p_{t}(x_t) \right) dt + g(t) \sqrt{\alpha} d w_{t}.\]

where \(\alpha \in [0,1]\) is a scalar weighting the diffusion term (\(\alpha = 0\) corresponds to the ordinary differential equation (ODE) sampling and \(\alpha > 0\) corresponds to the SDE sampling), and \(\nabla \log p_{t}(x_t)\) is the score function that can be approximated by (a properly scaled version of) Tweedie’s formula:

\[\nabla \log p_t(x_t) = \frac{\mathbb{E}\left[ s(t)x_0|x_t \right] - x_t }{s(t)^2\sigma(t)^2} \approx \frac{s(t) \denoiser{\frac{x_t}{s(t)}}{\sigma(t)} - x_t }{s(t)^2\sigma(t)^2}.\]

where \(\denoiser{\cdot}{\sigma}\) is a denoiser trained to denoise images with noise level \(\sigma\) that is \(\denoiser{x+\sigma\omega}{\sigma} \approx \mathbb{E} [ x|x+\sigma\omega ]\) with \(\omega\sim\mathcal{N}(0,\mathrm{I})\).

Note

Using a normalized noise levels \(\sigma(t)\) and scalings \(s(t)\) lets us use any denoiser in the library trained for multiple noise levels assuming pixel values are in the range \([0,1]\).

Starting from a random point following the end-point distribution \(p_T\) of the forward process, solving the reverse-time SDE gives us a sample of the data distribution \(p_0\).

The base classes for defining a SDEs are deepinv.sampling.BaseSDE and deepinv.sampling.DiffusionSDE.

Table 11 Stochastic Differential Equations#

SDE

\(f(t)\)

\(g(t)\)

Scaling \(s(t)\)

Noise level \(\sigma(t)\)

Variance Exploding

\(0\)

\(\sigma_{\mathrm{min}}\left(\frac{\sigma_{\mathrm{max}}}{\sigma_{\mathrm{min}}}\right)^t\)

\(1\)

\(\sigma_{\mathrm{min}}\left(\frac{\sigma_{\mathrm{max}}}{\sigma_{\mathrm{min}}}\right)^t\)

Variance Preserving

\(-\frac{1}{2}\left(\beta_{\mathrm{min}} + t \beta_d \right)\)

\(\sqrt{\beta_{\mathrm{min}} + t \beta_{d}}\)

\(1/\sqrt{e^{\frac{1}{2}\beta_{d}t^2+\beta_{\mathrm{min}}t}}\)

\(\sqrt{e^{\frac{1}{2}\beta_{d}t^2+\beta_{\mathrm{min}}t}-1}\)

Solvers#

Once the SDE is defined, we can obtain an approximate sample with any of the following solvers:

The base class for solvers is deepinv.sampling.BaseSDESolver, and deepinv.sampling.SDEOutput provides a container for storing the output of the solver.

Posterior sampling#

In the case of posterior sampling, we need simply to replace the (unconditional) score function \(\nabla \log p_t(x_t)\) by the conditional score function \(\nabla \log p_t(x_t|y)\). The conditional score can be decomposed using the Bayes’ rule:

\[\nabla \log p_t(x_t | y) = \nabla \log p_t(x_t) + \nabla \log p_t \left(y \vert \frac{x_t}{s(t)} = x_0 + \sigma(t)\omega\right).\]

The first term is the unconditional score function and can be approximated by using a denoiser as explained previously. The second term is the conditional score function, and can be approximated by the (noisy) data-fidelity term. We implement various data-fidelity terms in deepinv.sampling.NoisyDataFidelity.

Table 13 Noisy data-fidelity terms#

Method

Description

deepinv.sampling.NoisyDataFidelity

The base class for defining the noisy data-fidelity term, used to estimate the conditional score in the posterior sampling with SDE.

deepinv.sampling.DPSDataFidelity

The noisy data-fidelity term for the Diffusion Posterior Sampling (DPS) method. See also deepinv.sampling.DPS.

Uncertainty quantification#

Diffusion methods obtain a single sample per call. If multiple samples are required, the deepinv.sampling.DiffusionSampler can be used to convert a diffusion method into a sampler that obtains multiple samples to compute posterior statistics such as the mean or variance.

Markov Chain Monte Carlo#

Markov Chain Monte Carlo (MCMC) methods build a chain of samples which aim at sampling the negative-log-posterior distribution:

\[-\log p(x|y,A) \propto d(Ax,y) + \reg{x},\]

where \(x\) is the image to be reconstructed, \(y\) are the measurements, \(d(Ax,y) \propto - \log p(y|x,A)\) is the negative log-likelihood and \(\reg{x} \propto - \log p_{\sigma}(x)\) is the negative log-prior.

The negative log likelihood can be chosen from this list, and the negative log prior can be approximated using deepinv.optim.ScorePrior with a pretrained denoiser, which leverages Tweedie’s formula with \(\sigma\) is typically set to a small value. Unlike diffusion sampling methods, MCMC methods generally use a fixed noise level \(\sigma\) during the sampling process, i.e., \(\nabla \log p_t(x_t) = \frac{\left(\denoiser{x_t}{\sigma} - x_t \right)}{\sigma^2}\).

Note

The approximation of the prior obtained via deepinv.optim.ScorePrior is also valid for maximum-a-posteriori (MAP) denoisers, but \(p_{\sigma}(x)\) is not given by the convolution with a Gaussian kernel, but rather given by the Moreau-Yosida envelope of \(p(x)\), i.e.,

\[p_{\sigma}(x)=e^{- \inf_z \left(-\log p(z) + \frac{1}{2\sigma}\|x-z\|^2 \right)}.\]

All MCMC methods inherit from deepinv.sampling.MonteCarlo. We also provide MCMC methods for sampling from the posterior distribution based on the unadjusted Langevin algorithm.

Table 15 MCMC methods#

Method

Description

deepinv.sampling.ULA

Unadjusted Langevin algorithm.

deepinv.sampling.SKRock

Runge-Kutta-Chebyshev stochastic approximation to accelerate the standard Unadjusted Langevin Algorithm.