LidcIdriSliceDataset#
- class deepinv.datasets.LidcIdriSliceDataset(root: str, transform: Callable | None = None)#
Bases:
Dataset
Dataset for LIDC-IDRI that provides access to CT image slices.
The Lung Image Database Consortium image collection (LIDC-IDRI) consistsof diagnostic and lung cancer screening thoracic computed tomography (CT)scans with marked-up annotated lesions.Warning
To download the raw dataset, you will need to install the NBIA Data Retriever, then download the manifest file (.tcia file)`here <https://www.cancerimagingarchive.net/collection/lidc-idri/>`_, and open it by double clicking.
Raw data file structure:
self.root --- LIDC-IDRI --- LICENCE | -- LIDC-IDRI-0001 --- `STUDY_UID` --- `SERIES_UID` --- xxx.xml | | -- 1-001.dcm | -- LIDC-IDRI-1010 | | -- 1-xxx.dcm -- metadata.csv
0) There are 1010 patients and a total of 1018 CT scans.1) Each CT scan is composed of 2d slices.2) Each slice is stored as a .dcm file3) This class gives access to one slice of a CT scan per data sample.4) Each slice is represented as an (512, 512) array.- Parameters:
root (str) – Root directory of dataset. Directory path from where we load and save the dataset.
transform (callable, optional) – A function/transform that takes in a data sample and returns a transformed version.
- Examples:
Instantiate dataset
import torch from deepinv.datasets import LidcIdriSliceDataset root = "/path/to/dataset/LIDC-IDRI" dataset = LidcIdriSliceDataset(root=root) dataloader = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True) batch = next(iter(dataloader)) print(batch.shape)
- class SliceSampleIdentifier(slice_fname: str, scan_folder: str, patient_id: str)#
Bases:
NamedTuple
Data structure for identifying slices.
In LIDC-IDRI, there are 1010 patients. Among them, 8 patients have each 2 CT scans.
- Parameters: