generate_dataset#

class deepinv.datasets.generate_dataset(train_dataset: Dataset, physics: Physics, save_dir: str, test_dataset: Dataset = None, dataset_filename: str = 'dinv_dataset', overwrite_existing: bool = True, train_datapoints: int = None, test_datapoints: int = None, physics_generator: PhysicsGenerator = None, save_physics_generator_params: bool = True, batch_size: int = 4, num_workers: int = 0, supervised: bool = True, verbose: bool = True, show_progress_bar: bool = False, device: torch.device | str = 'cpu')[source]#

Bases:

Generates dataset of signal/measurement pairs from base dataset.

It generates the measurement data using the forward operator provided by the user. The dataset is saved in HD5 format and can be easily loaded using the deepinv.datasets.HD5Dataset class. The generated dataset contains a train and test splits.

Optionally, if random physics generator is used to generate data, also save physics generator params. This is useful e.g. if you are performing a parameter estimation task and want to evaluate the learnt parameters, or for measurement consistency/data fidelity, and require knowledge of the params when constructing the loss.

..note:

We support all dtypes supported by ``h5py`` including complex numbers, which will be stored as complex dtype.

..info:

By default, we overwrite existing datasets if they have been previously created. To avoid this, set ``overwrite_existing=False``.
Parameters:
  • train_dataset (torch.data.Dataset) – base dataset (e.g., MNIST, CelebA, etc.) with images used for generating associated measurements via the chosen forward operator. The generated dataset is saved in HD5 format and can be easily loaded using the HD5Dataset class.

  • physics (deepinv.physics.Physics) – Forward operator used to generate the measurement data. It can be either a single operator or a list of forward operators. In the latter case, the dataset will be assigned evenly across operators.

  • save_dir (str) – folder where the dataset and forward operator will be saved.

  • test_dataset (torch.data.Dataset) – if included, the function will also generate measurements associated to the test dataset.

  • dataset_filename (str) – desired filename of the dataset (without extension).

  • overwrite_existing (bool) – if True, create new dataset file, overwriting any existing dataset with the same dataset_filename. If False and dataset file already exists, does not create new dataset.

  • train_datapoints (int, None) – Desired number of datapoints in the training dataset. If set to None, it will use the number of datapoints in the base dataset. This is useful for generating a larger train dataset via data augmentation (which should be chosen in the train_dataset).

  • test_datapoints (int, None) – Desired number of datapoints in the test dataset. If set to None, it will use the number of datapoints in the base test dataset.

  • physics_generator (None, deepinv.physics.generator.PhysicsGenerator) – Optional physics generator for generating the physics operators. If not None, the physics operators are randomly sampled at each iteration using the generator.

  • save_physics_generator_params (bool) – save physics generator params too, ignored if physics_generator not used.

  • batch_size (int) – batch size for generating the measurement data (it affects the speed of the generating process, and the physics generator batch size)

  • num_workers (int) – number of workers for generating the measurement data (it only affects the speed of the generating process)

  • supervised (bool) – Generates supervised pairs (x,y) of measurements and signals. If set to False, it will generate a training dataset with measurements only (y) and a test dataset with pairs (x,y)

  • verbose (bool) – Output progress information in the console.

  • show_progress_bar (bool) – Show progress bar during the generation of the dataset (if verbose is set to True).

  • device (torch.device, str) – device, e.g. cpu or gpu, on which to generate measurements. All data is moved back to cpu before saving.

Examples using generate_dataset:#

Creating your own dataset

Creating your own dataset

Training a reconstruction network.

Training a reconstruction network.

Image deblurring with custom deep explicit prior.

Image deblurring with custom deep explicit prior.

DPIR method for PnP image deblurring.

DPIR method for PnP image deblurring.

Regularization by Denoising (RED) for Super-Resolution.

Regularization by Denoising (RED) for Super-Resolution.

Learned Iterative Soft-Thresholding Algorithm (LISTA) for compressed sensing

Learned Iterative Soft-Thresholding Algorithm (LISTA) for compressed sensing

Vanilla Unfolded algorithm for super-resolution

Vanilla Unfolded algorithm for super-resolution

Learned iterative custom prior

Learned iterative custom prior

Deep Equilibrium (DEQ) algorithms for image deblurring

Deep Equilibrium (DEQ) algorithms for image deblurring

Unfolded Chambolle-Pock for constrained image inpainting

Unfolded Chambolle-Pock for constrained image inpainting

Self-supervised learning with measurement splitting

Self-supervised learning with measurement splitting

Self-supervised denoising with the UNSURE loss.

Self-supervised denoising with the UNSURE loss.

Self-supervised denoising with the SURE loss.

Self-supervised denoising with the SURE loss.

Self-supervised denoising with the Neighbor2Neighbor loss.

Self-supervised denoising with the Neighbor2Neighbor loss.

Self-supervised learning with Equivariant Imaging for MRI.

Self-supervised learning with Equivariant Imaging for MRI.

Self-supervised learning from incomplete measurements of multiple operators.

Self-supervised learning from incomplete measurements of multiple operators.

Imaging inverse problems with adversarial networks

Imaging inverse problems with adversarial networks