generate_dataset#
- class deepinv.datasets.generate_dataset(train_dataset: Dataset, physics: Physics, save_dir: str, test_dataset: Dataset = None, dataset_filename: str = 'dinv_dataset', overwrite_existing: bool = True, train_datapoints: int = None, test_datapoints: int = None, physics_generator: PhysicsGenerator = None, save_physics_generator_params: bool = True, batch_size: int = 4, num_workers: int = 0, supervised: bool = True, verbose: bool = True, show_progress_bar: bool = False, device: torch.device | str = 'cpu')[source]#
Bases:
Generates dataset of signal/measurement pairs from base dataset.
It generates the measurement data using the forward operator provided by the user. The dataset is saved in HD5 format and can be easily loaded using the
deepinv.datasets.HD5Dataset
class. The generated dataset contains a train and test splits.Optionally, if random physics generator is used to generate data, also save physics generator params. This is useful e.g. if you are performing a parameter estimation task and want to evaluate the learnt parameters, or for measurement consistency/data fidelity, and require knowledge of the params when constructing the loss.
..note:
We support all dtypes supported by ``h5py`` including complex numbers, which will be stored as complex dtype.
..info:
By default, we overwrite existing datasets if they have been previously created. To avoid this, set ``overwrite_existing=False``.
- Parameters:
train_dataset (torch.data.Dataset) – base dataset (e.g., MNIST, CelebA, etc.) with images used for generating associated measurements via the chosen forward operator. The generated dataset is saved in HD5 format and can be easily loaded using the HD5Dataset class.
physics (deepinv.physics.Physics) – Forward operator used to generate the measurement data. It can be either a single operator or a list of forward operators. In the latter case, the dataset will be assigned evenly across operators.
save_dir (str) – folder where the dataset and forward operator will be saved.
test_dataset (torch.data.Dataset) – if included, the function will also generate measurements associated to the test dataset.
dataset_filename (str) – desired filename of the dataset (without extension).
overwrite_existing (bool) – if
True
, create new dataset file, overwriting any existing dataset with the samedataset_filename
. IfFalse
and dataset file already exists, does not create new dataset.train_datapoints (int, None) – Desired number of datapoints in the training dataset. If set to
None
, it will use the number of datapoints in the base dataset. This is useful for generating a larger train dataset via data augmentation (which should be chosen in the train_dataset).test_datapoints (int, None) – Desired number of datapoints in the test dataset. If set to
None
, it will use the number of datapoints in the base test dataset.physics_generator (None, deepinv.physics.generator.PhysicsGenerator) – Optional physics generator for generating the physics operators. If not None, the physics operators are randomly sampled at each iteration using the generator.
save_physics_generator_params (bool) – save physics generator params too, ignored if
physics_generator
not used.batch_size (int) – batch size for generating the measurement data (it affects the speed of the generating process, and the physics generator batch size)
num_workers (int) – number of workers for generating the measurement data (it only affects the speed of the generating process)
supervised (bool) – Generates supervised pairs
(x,y)
of measurements and signals. If set toFalse
, it will generate a training dataset with measurements only(y)
and a test dataset with pairs(x,y)
verbose (bool) – Output progress information in the console.
show_progress_bar (bool) – Show progress bar during the generation of the dataset (if verbose is set to
True
).device (torch.device, str) – device, e.g. cpu or gpu, on which to generate measurements. All data is moved back to cpu before saving.
Examples using generate_dataset
:#
Training a reconstruction network.
Image deblurring with custom deep explicit prior.
DPIR method for PnP image deblurring.
Regularization by Denoising (RED) for Super-Resolution.
Learned Iterative Soft-Thresholding Algorithm (LISTA) for compressed sensing
Vanilla Unfolded algorithm for super-resolution
Learned iterative custom prior
Deep Equilibrium (DEQ) algorithms for image deblurring
Unfolded Chambolle-Pock for constrained image inpainting
Self-supervised learning with measurement splitting
Self-supervised denoising with the UNSURE loss.
Self-supervised denoising with the SURE loss.
Self-supervised denoising with the Neighbor2Neighbor loss.
Self-supervised learning with Equivariant Imaging for MRI.
Self-supervised learning from incomplete measurements of multiple operators.
Imaging inverse problems with adversarial networks