DiffUNet

class deepinv.models.DiffUNet(in_channels=3, out_channels=3, large_model=False, use_fp16=False, pretrained='download')[source]

Bases: Module

Diffusion UNet model.

This is the model with attention and timestep embeddings from Ho et al.; code is adapted from https://github.com/jychoi118/ilvr_adm.

It is possible to choose the standard model with 128 hidden channels per layer (trained on FFHQ) and a larger model with 256 hidden channels per layer (trained on ImageNet128)

A pretrained network for (in_channels=out_channels=3) can be downloaded via setting pretrained='download'.

The network can handle images of size \(2^{n_1}\times 2^{n_2}\) with \(n_1,n_2 \geq 5\).

Warning

This model has 2 forward modes:

forward_diffuse: in the first mode, the model takes a noisy image and a timestep as input and estimates the noise map in the input image. This mode is consistent with the original implementation from the authors, i.e. it assumes the same image normalization.
forward_denoise: in the second mode, the model takes a noisy image and a noise level as input and estimates the noiseless underlying image in the input image. In this case, we assume that images have values in [0, 1] and a rescaling is performed under the hood.

Parameters:

in_channels (int) – channels in the input Tensor.
out_channels (int) – channels in the output Tensor.
large_model (bool) – if True, use the large model with 256 hidden channels per layer trained on ImageNet128 (weights size: 2.1 GB). Otherwise, use a smaller model with 128 hidden channels per layer trained on FFHQ (weights size: 357 MB).
pretrained (str, None) – use a pretrained network. If pretrained=None, the weights will be initialized at random using Pytorch’s default initialization. If pretrained='download', the weights will be downloaded from an online repository (only available for 3 input and output channels). Finally, pretrained can also be set as a path to the user’s own pretrained weights. See pretrained-weights for more details.

convert_to_fp16()[source]: Convert the torso of the model to float16.

convert_to_fp32()[source]: Convert the torso of the model to float32.

find_nearest(array, value)[source]: Find the argmin of the nearest value in an array.

forward(x, t, y=None, type_t='noise_level')[source]

Apply the model to an input batch.

This function takes a noisy image and either a timestep or a noise level as input. Depending on the nature of t, the model returns either a noise map (if type_t='timestep') or a denoised image (if type_t='noise_level').

Parameters:

x – an [N x C x …] Tensor of inputs.
t – a 1-D batch of timesteps or noise levels.
y – an [N] Tensor of labels, if class-conditional. Default=None.
type_t – Nature of the embedding t. In traditional diffusion model, and in the authors’ code, t is a timestep linked to a noise level; in this case, set type_t='timestep'. We can also choose t to be a noise level directly and use the model as a denoiser; in this case, set type_t='noise_level'. Default: 'timestep'.

Returns:

an [N x C x …] Tensor of outputs. Either a noise map (if type_t='timestep') or a denoised image (if type_t='noise_level').

forward_denoise(x, sigma, y=None)[source]

Applies the denoising model to an input batch.

This function takes a noisy image and a noise level as input (and not a timestep) and estimates the noiseless underlying image in the input image. The input image is assumed to be in range [0, 1] (up to noise) and to have dimensions with width and height divisible by a power of 2.

Note

The DiffUNet assumes that images are scaled as \(\sqrt{\alpha_t} x + (1-\alpha_t) n\) thus an additional rescaling by \(\sqrt{\alpha_t}\) is performed within this function, along with a mean shift by correction by \(0.5 - \sqrt{\alpha_t} 0.5\).

Parameters:

x – an [N x C x …] Tensor of inputs.
sigma – a 1-D batch of noise levels.
y – an [N] Tensor of labels, if class-conditional. Default=None.

Returns:

an [N x C x …] Tensor of outputs.

forward_diffusion(x, timesteps, y=None)[source]

Apply the model to an input batch.

This function takes a noisy image and a timestep as input (and not a noise level) and estimates the noise map in the input image. The image is assumed to be in range [-1, 1] and to have dimensions with width and height divisible by a power of 2.

Parameters:

x – an [N x C x …] Tensor of inputs.
timesteps – a 1-D batch of timesteps.
y – an [N] Tensor of labels, if class-conditional. Default=None.

Returns:

an [N x C x …] Tensor of outputs.

get_alpha_prod(beta_start=0.0001, beta_end=0.02, num_train_timesteps=1000)[source]: Get the alpha sequences; this is necessary for mapping noise levels to timesteps when performing pure denoising.

Examples using `DiffUNet`:

Implementing DPS

Implementing DiffPIR

DiffUNet

Examples using DiffUNet:

Examples using `DiffUNet`: