Skip to content

Preprocessing

Preprocessing module:

Converting Images to numpy form and resizing

This module consists of two main functions:
  • convert_images_to_numpy_format
  • data_preprocessing_wrapper

convert_images_to_numpy_format(input_directory)

wrapper function that reduces images in the input folder and converts content of folder to numpy format in parallel (faster)

Parameters

input_directory: (pathlib.PosixPath object) relative path to the images which should be preprocessed

Returns

output_directory: (pathlib.PosixPath object) relative path for the folder with preprocessed results (tranferred and converted images)

Example Usage

>>>from src.preprocessing import convert_images_to_numpy_format
>>>tumor_folder = Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor')
>>>convert_images_to_numpy_format(tumor_folder)

>>>output:
Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor')
Source code in src/preprocessing.py
def convert_images_to_numpy_format(
    input_directory: pathlib.PosixPath,
) -> pathlib.PosixPath:
    """

    wrapper function that reduces images in the input folder and converts content of folder to numpy format in parallel (faster)


    Parameters
    ----------

    **input_directory**: *(pathlib.PosixPath object)* relative path to the images which should be preprocessed


    Returns
    ------

    **output_directory**: *(pathlib.PosixPath object)* relative path for the folder with preprocessed results (tranferred and converted images)


    Example Usage
    --------------
    ```python
    >>>from src.preprocessing import convert_images_to_numpy_format
    >>>tumor_folder = Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor')
    >>>convert_images_to_numpy_format(tumor_folder)

    >>>output:
    Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor')
    ```

    """
    output_directory = create_preprocessing_relat_directory(input_directory)

    output_directory.mkdir(parents=True, exist_ok=True)

    directory_ok = check_content_of_two_directories(
        input_directory, output_directory
    )

    if directory_ok is False:
        if output_directory.exists():
            remove_content(output_directory)

        parallel_create_numpy_formats_of_input_img = partial(
            _create_numpy_formats_of_input_img,
            output_directory=output_directory,
        )

        parallel(
            parallel_create_numpy_formats_of_input_img,
            sorted(list(input_directory.glob("*.tiff"))),
            n_workers=12,
            progress=True,
            threadpool=True,
        )

    return output_directory

data_preprocessing_wrapper(data)

High level function that covers preprocessing for all data (blood vessels, tumors, virus). It used the convert_images_to_numpy_format function and applies to each channel (blood vessels, tumors, virus). If you want to preprocess just one channel (e.g. only tumors) use the convert_images_to_numpy_format function.

Parameters

data: (dict) containing keys (names of the channels) and values (relative paths to it).

Returns

preprocessed_data: *(dict) outputs three relative paths for the folders with preprocessed results (tranferred and converted images)

Results are stored on the disk.

Example Usage

>>>from src.preprocessing import data_preprocessing_wrapper
>>>study_paths = {'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/vessel'),
 'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor'),
 'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/virus')}

>>>data_preprocessing_wrapper(study_paths)

>>>output:
defaultdict(<function src.utils.nested_dict()>,
        {'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/vessel'),
         'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor'),
         'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/virus')})
Source code in src/preprocessing.py
@log_step
def data_preprocessing_wrapper(data: dict) -> dict:

    """
    High level function that covers preprocessing for all data (blood vessels, tumors, virus).
    It used the convert_images_to_numpy_format function and applies to each channel (blood vessels, tumors, virus).
    If you want to preprocess just one channel (e.g. only tumors) use the convert_images_to_numpy_format function.


    Parameters
    ----------

    **data**: *(dict)* containing keys (names of the channels) and values (relative paths to it).

    Returns
    ------
    **preprocessed_data**: *(dict) outputs three relative paths for the folders with preprocessed results (tranferred and converted images)

    Results are stored on the disk.

    Example Usage
    --------------
    ```python

    >>>from src.preprocessing import data_preprocessing_wrapper
    >>>study_paths = {'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/vessel'),
     'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor'),
     'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/virus')}

    >>>data_preprocessing_wrapper(study_paths)

    >>>output:
    defaultdict(<function src.utils.nested_dict()>,
            {'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/vessel'),
             'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor'),
             'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/virus')})

    ```
    """

    preprocessed_data = nested_dict()

    out_path = convert_images_to_numpy_format(data["vessel"])
    preprocessed_data["vessel"] = out_path

    # tumor
    out_path = convert_images_to_numpy_format(data["tumor"])
    preprocessed_data["tumor"] = out_path

    # virus
    out_path = convert_images_to_numpy_format(data["virus"])
    preprocessed_data["virus"] = out_path

    return preprocessed_data

This step will transform and downsize input data (tumor, vessel, virus) .tiff files into python's numpy array files, which will be saved to the output path directory. The new folder structure will look as follows:

ppdm
└─ data
   └─ 5IT_STUDY
        └─ config.json
        └─ source
            └─raw
            │   └─tumor
            │   │   └─ 5IT-4X_Ch2_z0300.tiff
            │   │   └─    ...
            │   │   └─ 5IT-4X_Ch2_z1300.tiff
            │   ├─vessel
            │   │   └─ 5IT-4X_Ch3_z0300.tiff
            │   │   └─    ...
            │   │   └─ 5IT-4X_Ch3_z1300.tiff
            │   │─virus
            │       └─ 5IT-4X_Ch1_z0300.tiff
            │       └─    ...
            │       └─5IT-4X_Ch1_z1300.tiff
------------│-------------------------------------------------------  
            └─transformed
                   └─ np_and_resized
                           └─tumor
                           │   └─ 5IT-4X_Ch2_z0300.np
                           │   └─    ...
                           │   └─ 5IT-4X_Ch2_z1300.np
                           ├─vessel
                           │   └─ 5IT-4X_Ch3_z0300.np
                           │   └─    ...
                           │   └─ 5IT-4X_Ch3_z1300.np
                           │─virus
                               └─ 5IT-4X_Ch1_z0300.np
                               └─    ...
                               └─5IT-4X_Ch1_z1300.np