generate_dataset

class deepinv.datasets.generate_dataset(train_dataset: Dataset, physics: Physics, save_dir: str, test_dataset: Dataset = None, device: torch.device | str = 'cpu', train_datapoints: int = None, test_datapoints: int = None, physics_generator: PhysicsGenerator = None, save_physics_generator_params: bool = True, dataset_filename: str = 'dinv_dataset', batch_size: int = 4, num_workers: int = 0, supervised: bool = True, verbose: bool = True, show_progress_bar: bool = False)[source]

Bases:

Generates dataset of signal/measurement pairs from base dataset.

It generates the measurement data using the forward operator provided by the user. The dataset is saved in HD5 format and can be easily loaded using the deepinv.datasets.HD5Dataset class. The generated dataset contains a train and test splits.

Optionally, if random physics generator is used to generate data, also save physics generator params. This is useful e.g. if you are performing a parameter estimation task and want to evaluate the learnt parameters, or for measurement consistency/data fidelity, and require knowledge of the params when constructing the loss.

..note:

We support all dtypes supported by ``h5py`` including complex numbers, which will be stored as complex dtype.

Parameters:

train_dataset (torch.data.Dataset) – base dataset (e.g., MNIST, CelebA, etc.) with images used for generating associated measurements via the chosen forward operator. The generated dataset is saved in HD5 format and can be easily loaded using the HD5Dataset class.
physics (deepinv.physics.Physics) – Forward operator used to generate the measurement data. It can be either a single operator or a list of forward operators. In the latter case, the dataset will be assigned evenly across operators.
save_dir (str) – folder where the dataset and forward operator will be saved.
test_dataset (torch.data.Dataset) – if included, the function will also generate measurements associated to the test dataset.
device (torch.device) – which indicates cpu or gpu.
train_datapoints (int, None) – Desired number of datapoints in the training dataset. If set to None, it will use the number of datapoints in the base dataset. This is useful for generating a larger train dataset via data augmentation (which should be chosen in the train_dataset).
test_datapoints (int, None) – Desired number of datapoints in the test dataset. If set to None, it will use the number of datapoints in the base test dataset.
physics_generator (None, deepinv.physics.generator.PhysicsGenerator) – Optional physics generator for generating the physics operators. If not None, the physics operators are randomly sampled at each iteration using the generator.
save_physics_generator_params (bool) – save physics generator params too, ignored if physics_generator not used.
dataset_filename (str) – desired filename of the dataset.
batch_size (int) – batch size for generating the measurement data (it affects the speed of the generating process, and the physics generator batch size)
num_workers (int) – number of workers for generating the measurement data (it only affects the speed of the generating process)
supervised (bool) – Generates supervised pairs (x,y) of measurements and signals. If set to False, it will generate a training dataset with measurements only (y) and a test dataset with pairs (x,y)
verbose (bool) – Output progress information in the console.
show_progress_bar (bool) – Show progress bar during the generation of the dataset (if verbose is set to True).