DiffUNet
- class deepinv.models.DiffUNet(in_channels=3, out_channels=3, large_model=False, use_fp16=False, pretrained='download')[source]
Bases:
Module
Diffusion UNet model.
This is the model with attention and timestep embeddings from Ho et al.; code is adapted from https://github.com/jychoi118/ilvr_adm.
It is possible to choose the standard model with 128 hidden channels per layer (trained on FFHQ) and a larger model with 256 hidden channels per layer (trained on ImageNet128)
A pretrained network for (in_channels=out_channels=3) can be downloaded via setting
pretrained='download'
.The network can handle images of size \(2^{n_1}\times 2^{n_2}\) with \(n_1,n_2 \geq 5\).
Warning
This model has 2 forward modes:
forward_diffuse
: in the first mode, the model takes a noisy image and a timestep as input and estimates the noise map in the input image. This mode is consistent with the original implementation from the authors, i.e. it assumes the same image normalization.forward_denoise
: in the second mode, the model takes a noisy image and a noise level as input and estimates the noiseless underlying image in the input image. In this case, we assume that images have values in [0, 1] and a rescaling is performed under the hood.
- Parameters:
in_channels (int) – channels in the input Tensor.
out_channels (int) – channels in the output Tensor.
large_model (bool) – if True, use the large model with 256 hidden channels per layer trained on ImageNet128 (weights size: 2.1 GB). Otherwise, use a smaller model with 128 hidden channels per layer trained on FFHQ (weights size: 357 MB).
pretrained (str, None) – use a pretrained network. If
pretrained=None
, the weights will be initialized at random using Pytorch’s default initialization. Ifpretrained='download'
, the weights will be downloaded from an online repository (only available for 3 input and output channels). Finally,pretrained
can also be set as a path to the user’s own pretrained weights. See pretrained-weights for more details.
- forward(x, t, y=None, type_t='noise_level')[source]
Apply the model to an input batch.
This function takes a noisy image and either a timestep or a noise level as input. Depending on the nature of
t
, the model returns either a noise map (iftype_t='timestep'
) or a denoised image (iftype_t='noise_level'
).- Parameters:
x – an [N x C x …] Tensor of inputs.
t – a 1-D batch of timesteps or noise levels.
y – an [N] Tensor of labels, if class-conditional. Default=None.
type_t – Nature of the embedding t. In traditional diffusion model, and in the authors’ code, t is a timestep linked to a noise level; in this case, set
type_t='timestep'
. We can also chooset
to be a noise level directly and use the model as a denoiser; in this case, settype_t='noise_level'
. Default:'timestep'
.
- Returns:
an [N x C x …] Tensor of outputs. Either a noise map (if
type_t='timestep'
) or a denoised image (iftype_t='noise_level'
).
- forward_denoise(x, sigma, y=None)[source]
Applies the denoising model to an input batch.
This function takes a noisy image and a noise level as input (and not a timestep) and estimates the noiseless underlying image in the input image. The input image is assumed to be in range [0, 1] (up to noise) and to have dimensions with width and height divisible by a power of 2.
Note
The DiffUNet assumes that images are scaled as \(\sqrt{\alpha_t} x + (1-\alpha_t) n\) thus an additional rescaling by \(\sqrt{\alpha_t}\) is performed within this function, along with a mean shift by correction by \(0.5 - \sqrt{\alpha_t} 0.5\).
- Parameters:
x – an [N x C x …] Tensor of inputs.
sigma – a 1-D batch of noise levels.
y – an [N] Tensor of labels, if class-conditional. Default=None.
- Returns:
an [N x C x …] Tensor of outputs.
- forward_diffusion(x, timesteps, y=None)[source]
Apply the model to an input batch.
This function takes a noisy image and a timestep as input (and not a noise level) and estimates the noise map in the input image. The image is assumed to be in range [-1, 1] and to have dimensions with width and height divisible by a power of 2.
- Parameters:
x – an [N x C x …] Tensor of inputs.
timesteps – a 1-D batch of timesteps.
y – an [N] Tensor of labels, if class-conditional. Default=None.
- Returns:
an [N x C x …] Tensor of outputs.