generate_dataset#

deepinv.datasets.generate_dataset(train_dataset, physics, save_dir, test_dataset=None, val_dataset=None, dataset_filename='dinv_dataset', overwrite_existing=True, train_datapoints=None, test_datapoints=None, val_datapoints=None, physics_generator=None, save_physics_generator_params=True, batch_size=4, num_workers=0, supervised=True, verbose=True, show_progress_bar=False, device='cpu')[source]#

Generates dataset of signal/measurement pairs from base dataset.

It generates the measurement data using the forward operator provided by the user. The dataset is saved in HDF5 format and can be easily loaded using the deepinv.datasets.HDF5Dataset class. The generated dataset contains train and test splits.

The base dataset of ground-truth images must return tensors x or tuples (x, ...). We provide a large library of predefined popular imaging datasets. See datasets user guide for more information.

Optionally, if random physics generator is used to generate data, also save physics generator params. This is useful e.g. if you are performing a parameter estimation task and want to evaluate the learnt parameters, or for measurement consistency/data fidelity, and require knowledge of the params when constructing the loss.

Note

We support all dtypes supported by h5py including complex numbers, which will be stored as complex dtype.

Note

By default, we overwrite existing datasets if they have been previously created. To avoid this, set overwrite_existing=False.

Parameters:

train_dataset (torch.utils.data.Dataset) – base dataset of ground-truth images. Must return tensors x or tuples (x, ...).
physics (deepinv.physics.Physics) – Forward operator used to generate the measurement data. It can be either a single operator or a list of forward operators. In the latter case, the dataset will be assigned evenly across operators.
save_dir (str) – folder where the dataset and forward operator will be saved.
test_dataset (torch.utils.data.Dataset) – if included, the function will also generate measurements associated to the test dataset.
val_dataset (torch.utils.data.Dataset) – if included, the function will also generate measurements associated to the validation dataset.
dataset_filename (str) – desired filename of the dataset (without extension).
overwrite_existing (bool) – if True, create new dataset file, overwriting any existing dataset with the same dataset_filename. If False and dataset file already exists, does not create new dataset.
train_datapoints (int, None) – Desired number of datapoints in the training dataset. If set to None, it will use the number of datapoints in the base dataset. This is useful for generating a larger train dataset via data augmentation (which should be chosen in the train_dataset).
test_datapoints (int, None) – Desired number of datapoints in the test dataset. If set to None, it will use the number of datapoints in the base test dataset.
val_datapoints (int, None) – Desired number of datapoints in the val dataset.
physics_generator (None, deepinv.physics.generator.PhysicsGenerator) – Optional physics generator for generating the physics operators. If not None, the physics operators are randomly sampled at each iteration using the generator.
save_physics_generator_params (bool) – save physics generator params too, ignored if physics_generator not used.
batch_size (int) – batch size for generating the measurement data (it affects the speed of the generating process, and the physics generator batch size)
num_workers (int) – number of workers for generating the measurement data (it only affects the speed of the generating process)
supervised (bool) – Generates supervised pairs (x,y) of measurements and signals. If set to False, it will generate a training dataset with measurements only (y) and a test dataset with pairs (x,y)
verbose (bool) – Output progress information in the console.
show_progress_bar (bool) – Show progress bar during the generation of the dataset (if verbose is set to True).
device (torch.device, str) – device, e.g. cpu or gpu, on which to generate measurements. All data is moved back to cpu before saving.