LidcIdriSliceDataset

class deepinv.datasets.LidcIdriSliceDataset(root: str, transform: Callable | None = None)

Bases: Dataset

Dataset for LIDC-IDRI that provides access to CT image slices.

The Lung Image Database Consortium image collection (LIDC-IDRI) consists
of diagnostic and lung cancer screening thoracic computed tomography (CT)
scans with marked-up annotated lesions.

Warning

To download the raw dataset, you will need to install the NBIA Data Retriever, then download the manifest file (.tcia file)`here <https://www.cancerimagingarchive.net/collection/lidc-idri/>`_, and open it by double clicking.

Raw data file structure:

self.root --- LIDC-IDRI --- LICENCE
           |             -- LIDC-IDRI-0001 --- `STUDY_UID` --- `SERIES_UID` --- xxx.xml
           |             |                                                   -- 1-001.dcm
           |             -- LIDC-IDRI-1010                                   |
           |                                                                 -- 1-xxx.dcm
           -- metadata.csv
0) There are 1010 patients and a total of 1018 CT scans.
1) Each CT scan is composed of 2d slices.
2) Each slice is stored as a .dcm file
3) This class gives access to one slice of a CT scan per data sample.
4) Each slice is represented as an (512, 512) array.
Parameters:
  • root (str) – Root directory of dataset. Directory path from where we load and save the dataset.

  • transform (callable, optional) – A function/transform that takes in a data sample and returns a transformed version.


Examples:

Instanciate dataset

import torch
from deepinv.datasets import LidcIdriSliceDataset
root = "/path/to/dataset/LIDC-IDRI"
dataset = LidcIdriSliceDataset(root=root)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=2, shuffle=True)
batch = next(iter(dataloader))
print(batch.shape)
class SliceSampleIdentifier(slice_fname: str, scan_folder: str, patient_id: str)

Bases: NamedTuple

Data structure for identifying slices.

In LIDC-IDRI, there are 1010 patients. Among them, 8 patients have each 2 CT scans.

Parameters:
  • slice_fname (str) – Filename of a dicom file containing 1 slice of the scan.

  • scan_folder (str) – Path to all dicom files from the same scan.

  • patient_id (str) – Foldername of one patient among the 1010.

patient_id: str

Alias for field number 2

scan_folder: str

Alias for field number 1

slice_fname: str

Alias for field number 0