DiffUNet

class deepinv.models.DiffUNet(in_channels=3, out_channels=3, large_model=False, use_fp16=False, pretrained='download')[source]

Bases: Module

Diffusion UNet model.

This is the model with attention and timestep embeddings from Ho et al.; code is adapted from https://github.com/jychoi118/ilvr_adm.

It is possible to choose the standard model with 128 hidden channels per layer (trained on FFHQ) and a larger model with 256 hidden channels per layer (trained on ImageNet128)

A pretrained network for (in_channels=out_channels=3) can be downloaded via setting pretrained='download'.

The network can handle images of size \(2^{n_1}\times 2^{n_2}\) with \(n_1,n_2 \geq 5\).

Warning

This model has 2 forward modes:

  • forward_diffuse: in the first mode, the model takes a noisy image and a timestep as input and estimates the noise map in the input image. This mode is consistent with the original implementation from the authors, i.e. it assumes the same image normalization.

  • forward_denoise: in the second mode, the model takes a noisy image and a noise level as input and estimates the noiseless underlying image in the input image. In this case, we assume that images have values in [0, 1] and a rescaling is performed under the hood.

Parameters:
  • in_channels (int) – channels in the input Tensor.

  • out_channels (int) – channels in the output Tensor.

  • large_model (bool) – if True, use the large model with 256 hidden channels per layer trained on ImageNet128 (weights size: 2.1 GB). Otherwise, use a smaller model with 128 hidden channels per layer trained on FFHQ (weights size: 357 MB).

  • pretrained (str, None) – use a pretrained network. If pretrained=None, the weights will be initialized at random using Pytorch’s default initialization. If pretrained='download', the weights will be downloaded from an online repository (only available for 3 input and output channels). Finally, pretrained can also be set as a path to the user’s own pretrained weights. See pretrained-weights for more details.

convert_to_fp16()[source]

Convert the torso of the model to float16.

convert_to_fp32()[source]

Convert the torso of the model to float32.

find_nearest(array, value)[source]

Find the argmin of the nearest value in an array.

forward(x, t, y=None, type_t='noise_level')[source]

Apply the model to an input batch.

This function takes a noisy image and either a timestep or a noise level as input. Depending on the nature of t, the model returns either a noise map (if type_t='timestep') or a denoised image (if type_t='noise_level').

Parameters:
  • x – an [N x C x …] Tensor of inputs.

  • t – a 1-D batch of timesteps or noise levels.

  • y – an [N] Tensor of labels, if class-conditional. Default=None.

  • type_t – Nature of the embedding t. In traditional diffusion model, and in the authors’ code, t is a timestep linked to a noise level; in this case, set type_t='timestep'. We can also choose t to be a noise level directly and use the model as a denoiser; in this case, set type_t='noise_level'. Default: 'timestep'.

Returns:

an [N x C x …] Tensor of outputs. Either a noise map (if type_t='timestep') or a denoised image (if type_t='noise_level').

forward_denoise(x, sigma, y=None)[source]

Applies the denoising model to an input batch.

This function takes a noisy image and a noise level as input (and not a timestep) and estimates the noiseless underlying image in the input image. The input image is assumed to be in range [0, 1] (up to noise) and to have dimensions with width and height divisible by a power of 2.

Note

The DiffUNet assumes that images are scaled as \(\sqrt{\alpha_t} x + (1-\alpha_t) n\) thus an additional rescaling by \(\sqrt{\alpha_t}\) is performed within this function, along with a mean shift by correction by \(0.5 - \sqrt{\alpha_t} 0.5\).

Parameters:
  • x – an [N x C x …] Tensor of inputs.

  • sigma – a 1-D batch of noise levels.

  • y – an [N] Tensor of labels, if class-conditional. Default=None.

Returns:

an [N x C x …] Tensor of outputs.

forward_diffusion(x, timesteps, y=None)[source]

Apply the model to an input batch.

This function takes a noisy image and a timestep as input (and not a noise level) and estimates the noise map in the input image. The image is assumed to be in range [-1, 1] and to have dimensions with width and height divisible by a power of 2.

Parameters:
  • x – an [N x C x …] Tensor of inputs.

  • timesteps – a 1-D batch of timesteps.

  • y – an [N] Tensor of labels, if class-conditional. Default=None.

Returns:

an [N x C x …] Tensor of outputs.

get_alpha_prod(beta_start=0.0001, beta_end=0.02, num_train_timesteps=1000)[source]

Get the alpha sequences; this is necessary for mapping noise levels to timesteps when performing pure denoising.

Examples using DiffUNet:

Implementing DPS

Implementing DPS

Implementing DiffPIR

Implementing DiffPIR