HDF5Dataset#

class deepinv.datasets.HDF5Dataset(path, train=None, split=None, transform=None, load_physics_generator_params=False, dtype=torch.float, complex_dtype=torch.cfloat)[source]#

Bases: ImageDataset

DeepInverse HDF5 dataset

DeepInverse features its own file format for imaging datasets designed as a subset of the HDF5 file format. A dataset in this format is typically obtained from a base dataset of ground-truth images and measured through a forward operator using the function deepinv.datasets.generate_dataset(). This class features the code to load them in memory.


Basics:

The file containing the dataset is opened in the constructor and remains opened until the method close() is called.

Parameters:

path (str) – Path to the HDF5 file containing the dataset.


Splits:

The dataset is structured in splits that are either freely named or understood specifically as training and testing splits. By convention, the training split is named train and the testing split test. In both cases, the parameter split can be used to select one of the splits available in the dataset. For the specific case of training and testing splits, they can be loaded in using the boolean parameter train. If train=True, the training split is loaded, otherwise the testing split is loaded.

By default, if neither split nor train is provided, it attempts to load the training split. We don’t recommend relying on this behaviour which is likely to change in future versions of the library.

Warning

If both split and train are provided, then split takes precedence and train is ignored. We recommend that you only specify one of the two parameters to avoid errors.

Note

A single instance of the class holds a single split of the dataset. If you wish to load multiple splits, you must instantiate the class once per split. See for instance Training a reconstruction model.

Parameters:
  • split (str, None) – The name of the split to load, for instance "train", "test" or "val"`. It can be left unspecified if train is used instead.

  • train (bool) – If split is left unspecified, uses "train" as the split name if set to True and "test" otherwise. Note that if split is specified, this parameter is ignored with a warning.


Entries:

HDF5 datasets adhere to our conventions for datasets. In particular, their entries are either pairs of ground truth images and measuements (x, y) or triplets with additional physics parameters (x, y, params). It is possible that the datast does not contain ground truth data and in this case the ground truth is replaced by a scalar NaN tensor.

Physics parameters represent additional information about the measurement process. For instance the mask for inpainting or the blur kernel for deblurring. HDF5 datasets can contain the physics parameters used to generate each set of measurements and in that case, they are returned with each entry as a dictionary as long as the parameter load_physics_generator_params is set to True. Note that if the parameter is set and the dataset does not contain any physics parameter, an empty dictionary is returned nonetheless.

Measurements intended to be used with deepinv.physics.StackedPhysics are stored across multiple members of the HDF5 file, one per operator. In that case, member names follow the format y{i}_{split_name} where i denotes the stack index (starting at 0) and they are loaded as a deepinv.utils.tensorlist.TensorList.

Note

Physics parameters are identified using a fallback logic. Namely, every member with a name of the form {prefix}_{split_name} that is neither interpreted as containing ground truths or measurements (including stacked measurements) defaults to being interpreted as containing physics parameter, with prefix denoting the parameter name. In particular, the joint presence of physics parameters and stacked measurements is generally supported as long as custom physics parameter names cannot be misinterpreted as ground truth or measurements names, for instance x, y, and y0 are unsupported parameter names.

Note

HDF5 datasets always contain measurements even though our conventions permit datasets with only ground truths (with or without physics parameters).

Parameters:

load_physics_generator_params (bool) – Return the physics parameters with each entry. If no physics parameter is featured in the dataset, an empty dictionary is returned nonetheless.


Pre-processing:

The data loaded in from the disk is not necessarily returned as is. The pre-processing pipeline contains two steps. First, the real and complex numbers are cast to user-provided dtypes using the parameters dtype and complex_dtype. Then, an optional transform provided by the user through the parameter transform is applied to the ground truth image.

Note

The user-provided transformation is only applied to the ground truth image. It does not affect the measurements or the physics parameters.

Parameters:
  • dtype (torch.dtype, str) – The dtype for real-valued numbers, by default torch.float.

  • complex_dtype (torch.dtype, str) – The dtype for complex-valued numbers, by default torch.cfloat.

  • transform (Transform, Callable, None) – An optional transformation applied to the ground truth.

close()[source]#

Closes the HDF5 dataset. Use when you are finished with the dataset.

property unsupervised: bool#

Test if the split is unsupervised (i.e. contains no ground truths).

Examples using HDF5Dataset:#

Imaging inverse problems with adversarial networks

Imaging inverse problems with adversarial networks

Bring your own dataset

Bring your own dataset

5 minute quickstart tutorial

5 minute quickstart tutorial

Training a reconstruction model

Training a reconstruction model

Image deblurring with custom deep explicit prior.

Image deblurring with custom deep explicit prior.

Tour of MRI functionality in DeepInverse

Tour of MRI functionality in DeepInverse

DPIR method for PnP image deblurring.

DPIR method for PnP image deblurring.

Regularization by Denoising (RED) for Super-Resolution.

Regularization by Denoising (RED) for Super-Resolution.

Self-supervised learning with Equivariant Imaging for MRI.

Self-supervised learning with Equivariant Imaging for MRI.

Self-supervised learning from incomplete measurements of multiple operators.

Self-supervised learning from incomplete measurements of multiple operators.

Self-supervised denoising with the Neighbor2Neighbor loss.

Self-supervised denoising with the Neighbor2Neighbor loss.

Self-supervised denoising with the Generalized R2R loss.

Self-supervised denoising with the Generalized R2R loss.

Self-supervised learning with measurement splitting

Self-supervised learning with measurement splitting

Self-supervised denoising with the SURE loss.

Self-supervised denoising with the SURE loss.

Self-supervised denoising with the UNSURE loss.

Self-supervised denoising with the UNSURE loss.

Deep Equilibrium (DEQ) algorithms for image deblurring

Deep Equilibrium (DEQ) algorithms for image deblurring

Learned Iterative Soft-Thresholding Algorithm (LISTA) for compressed sensing

Learned Iterative Soft-Thresholding Algorithm (LISTA) for compressed sensing

Learned iterative custom prior

Learned iterative custom prior

Unfolded Chambolle-Pock for constrained image inpainting

Unfolded Chambolle-Pock for constrained image inpainting

Vanilla Unfolded algorithm for super-resolution

Vanilla Unfolded algorithm for super-resolution