.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/distributed/demo_denoiser_distributed.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        New to DeepInverse? Get started with the basics with the
        :ref:`5 minute quickstart tutorial <sphx_glr_auto_examples_basics_demo_quickstart.py>`..

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_distributed_demo_denoiser_distributed.py:


Distributed Denoiser with Image Tiling
------------------------------------------------

In many imaging problems, the data to be processed can be very large, making it challenging to fit the
denoising process into the memory of a single device. For instance, medical imaging or satellite imagery
often involves processing gigapixel images that cannot be processed as a whole.

The distributed framework enables you to parallelize the denoising of large images across multiple devices
using image tiling. Each device processes different image patches independently, and the results are merged
to produce the final denoised image.

This example demonstrates how to use the :func:`deepinv.distributed.distribute` function to create a
distributed denoiser that automatically handles patch extraction, processing, and merging.

**Usage:**

.. code-block:: bash

    # Single process
    python examples/distributed/demo_denoiser_distributed.py

.. code-block:: bash

    # Multi-process with torchrun (2 GPUs/processes)
    python -m torch.distributed.run --nproc_per_node=2 examples/distributed/demo_denoiser_distributed.py

**Key Features:**

- Distribute denoising across processes/devices using image tiling
- Automatic patch extraction and reassembly
- Memory-efficient processing of large images

**Key Steps:**

1. Load a large test image
2. Add noise to create a noisy observation
3. Initialize distributed context
4. Configure tiling parameters
5. Distribute denoiser with :func:`deepinv.distributed.distribute`
6. Apply distributed denoising
7. Visualize results and compute metrics

Import modules and define noisy image generation
---------------------------------------------------------
We start by importing `torch` and the modules of deepinv that we use in this example. We also define a function that generates noisy images to evaluate the distributed framework.

.. GENERATED FROM PYTHON SOURCE LINES 51-93

.. code-block:: Python

    import torch
    from deepinv.models import DRUNet
    from deepinv.utils.demo import load_example
    from deepinv.utils.plotting import plot
    from deepinv.loss.metric import PSNR

    # Import distributed framework
    from deepinv.distributed import DistributedContext, distribute


    def create_noisy_image(device, img_size=1024, noise_sigma=0.1, seed=42):
        """
        Create a noisy test image.

        :param device: Device to create image on
        :param tuple img_size: Size of the image (H, W)
        :param float noise_sigma: Standard deviation of Gaussian noise
        :param int seed: Random seed for reproducible noise
        :returns: Tuple of (clean_image, noisy_image, noise_sigma)
        """
        # Load example image in original size
        clean_image = load_example(
            "CBSD_0010.png",
            grayscale=False,
            device=device,
            img_size=img_size,
            resize_mode="resize",
        )

        # Set seed for reproducible noise
        torch.manual_seed(seed)

        # Add Gaussian noise
        noise = torch.randn_like(clean_image) * noise_sigma
        noisy_image = clean_image + noise

        # Clip to valid range
        noisy_image = torch.clamp(noisy_image, 0, 1)

        return clean_image, noisy_image, noise_sigma


.. GENERATED FROM PYTHON SOURCE LINES 94-97

------------------------------------
Configuration of parallel denoising
------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 97-103

.. code-block:: Python


    img_size = 512  # Large image for demonstrating tiling
    noise_sigma = 0.1
    patch_size = 256  # Size of each patch
    overlap = 64  # Overlap for smooth boundaries


.. GENERATED FROM PYTHON SOURCE LINES 104-107

---------------------------------------------
Define distributed context and run algorithm
---------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 107-257

.. code-block:: Python


    # Initialize distributed context (handles single and multi-process automatically)
    with DistributedContext(seed=42) as ctx:

        if ctx.rank == 0:
            print("=" * 70)
            print("Distributed Denoiser Demo")
            print("=" * 70)
            print(f"\nRunning on {ctx.world_size} process(es)")
            print(f"   Device: {ctx.device}")

        # ---------------------------------------------------------------------------
        # Step 1: Create test image with noise
        # ---------------------------------------------------------------------------

        clean_image, noisy_image, sigma = create_noisy_image(
            ctx.device, img_size=img_size, noise_sigma=noise_sigma
        )

        # Compute input PSNR (create metric on all ranks for consistency)
        psnr_metric = PSNR()
        input_psnr = psnr_metric(noisy_image, clean_image).item()

        if ctx.rank == 0:
            print(f"\nCreated test image")
            print(f"   Image shape: {clean_image.shape}")
            print(f"   Noise sigma: {sigma}")
            print(f"   Input PSNR: {input_psnr:.2f} dB")

        # ---------------------------------------------------------------------------
        # Step 2: Load denoiser model
        # ---------------------------------------------------------------------------

        if ctx.rank == 0:
            print(f"\nLoading DRUNet denoiser...")

        denoiser = DRUNet(pretrained="download").to(ctx.device)

        if ctx.rank == 0:
            print(f"   Denoiser loaded")

        # ---------------------------------------------------------------------------
        # Step 3: Distribute denoiser with tiling configuration
        # ---------------------------------------------------------------------------

        if ctx.rank == 0:
            print(f"\nConfiguring distributed denoiser")
            print(f"   Patch size: {patch_size}x{patch_size}")
            print(f"   Receptive field radius: {overlap}")
            print(f"   Tiling strategy: overlap_tiling")

        distributed_denoiser = distribute(
            denoiser,
            ctx,
            patch_size=patch_size,
            overlap=overlap,
        )

        if ctx.rank == 0:
            print(f"   Distributed denoiser created")

        # ---------------------------------------------------------------------------
        # Step 4: Apply distributed denoising
        # ---------------------------------------------------------------------------

        if ctx.rank == 0:
            print(f"\nApplying distributed denoising...")

        with torch.no_grad():
            denoised_image = distributed_denoiser(noisy_image, sigma=sigma)

        if ctx.rank == 0:
            print(f"   Denoising completed")
            print(f"   Output shape: {denoised_image.shape}")

        # Compare with non-distributed result (only on rank 0)
        if ctx.rank == 0:
            print(f"\nComparing with non-distributed denoising...")
            with torch.no_grad():
                denoised_ref = denoiser(noisy_image, sigma=sigma)

            diff = torch.abs(denoised_image - denoised_ref)
            mean_diff = diff.mean().item()
            max_diff = diff.max().item()

            print(f"   Mean absolute difference: {mean_diff:.2e}")
            print(f"   Max absolute difference:  {max_diff:.2e}")

            # Check that differences are small (due to tiling boundary effects)
            # The distributed version uses tiling with overlapping patches and blending,
            # which can produce slightly different results at patch boundaries.
            # These differences are typically very small (< 0.01 mean, < 0.5 max).
            tolerance_mean = 0.01
            tolerance_max = 0.5
            assert (
                mean_diff < tolerance_mean
            ), f"Mean difference too large: {mean_diff:.4f} (tolerance: {tolerance_mean})"
            assert (
                max_diff < tolerance_max
            ), f"Max difference too large: {max_diff:.4f} (tolerance: {tolerance_max})"
            print(f"   Results are very close (within tolerance)!")

        # ---------------------------------------------------------------------------
        # Step 5: Compute metrics and visualize results (only on rank 0)
        # ---------------------------------------------------------------------------

        if ctx.rank == 0:
            # Compute output PSNR
            output_psnr = psnr_metric(denoised_image, clean_image).item()
            psnr_improvement = output_psnr - input_psnr

            print(f"\nResults:")
            print(f"   Input PSNR:  {input_psnr:.2f} dB")
            print(f"   Output PSNR: {output_psnr:.2f} dB")
            print(f"   Improvement: {psnr_improvement:.2f} dB")

            # Plot results
            plot(
                [clean_image, noisy_image, denoised_image],
                titles=[
                    "Clean Image",
                    f"Noisy (PSNR: {input_psnr:.2f} dB)",
                    f"Denoised (PSNR: {output_psnr:.2f} dB)",
                ],
                save_fn="distributed_denoiser_result.png",
                figsize=(15, 4),
            )

            # Plot zoom on a region to see details
            # Extract a 256x256 patch from center
            h, w = clean_image.shape[-2:]
            y_start, x_start = h // 2 - 128, w // 2 - 128
            y_end, x_end = y_start + 256, x_start + 256

            clean_patch = clean_image[..., y_start:y_end, x_start:x_end]
            noisy_patch = noisy_image[..., y_start:y_end, x_start:x_end]
            denoised_patch = denoised_image[..., y_start:y_end, x_start:x_end]

            plot(
                [clean_patch, noisy_patch, denoised_patch],
                titles=["Clean (zoom)", "Noisy (zoom)", "Denoised (zoom)"],
                save_fn="distributed_denoiser_zoom.png",
                figsize=(15, 4),
            )

            print(f"\nDemo completed successfully!")
            print(f"   Results saved to:")
            print(f"   - distributed_denoiser_result.png")
            print(f"   - distributed_denoiser_zoom.png")
            print("\n" + "=" * 70)


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /auto_examples/distributed/images/sphx_glr_demo_denoiser_distributed_001.png
         :alt: Clean Image, Noisy (PSNR: 20.34 dB), Denoised (PSNR: 34.01 dB)
         :srcset: /auto_examples/distributed/images/sphx_glr_demo_denoiser_distributed_001.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/distributed/images/sphx_glr_demo_denoiser_distributed_002.png
         :alt: Clean (zoom), Noisy (zoom), Denoised (zoom)
         :srcset: /auto_examples/distributed/images/sphx_glr_demo_denoiser_distributed_002.png
         :class: sphx-glr-multi-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    ======================================================================
    Distributed Denoiser Demo
    ======================================================================

    Running on 1 process(es)
       Device: cuda:0

    Created test image
       Image shape: torch.Size([1, 3, 767, 512])
       Noise sigma: 0.1
       Input PSNR: 20.34 dB

    Loading DRUNet denoiser...
       Denoiser loaded

    Configuring distributed denoiser
       Patch size: 256x256
       Receptive field radius: 64
       Tiling strategy: overlap_tiling
       Distributed denoiser created

    Applying distributed denoising...
    /local/jtachell/deepinv/deepinv/deepinv/distributed/strategies.py:476: UserWarning: No tiling_dims provided. Assuming last 2 dimensions: (-2, -1). If your layout is different, please provide tiling_dims explicitly.
      warnings.warn(
       Denoising completed
       Output shape: torch.Size([1, 3, 767, 512])

    Comparing with non-distributed denoising...
       Mean absolute difference: 6.39e-04
       Max absolute difference:  9.65e-02
       Results are very close (within tolerance)!

    Results:
       Input PSNR:  20.34 dB
       Output PSNR: 34.01 dB
       Improvement: 13.67 dB

    Demo completed successfully!
       Results saved to:
       - distributed_denoiser_result.png
       - distributed_denoiser_zoom.png

    ======================================================================


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 10.415 seconds)


.. _sphx_glr_download_auto_examples_distributed_demo_denoiser_distributed.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: demo_denoiser_distributed.ipynb <demo_denoiser_distributed.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: demo_denoiser_distributed.py <demo_denoiser_distributed.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: demo_denoiser_distributed.zip <demo_denoiser_distributed.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_