Preprocessing
Preprocessing module:
Converting Images to numpy form and resizing
This module consists of two main functions:
- convert_images_to_numpy_format
- data_preprocessing_wrapper
convert_images_to_numpy_format(input_directory)
wrapper function that reduces images in the input folder and converts content of folder to numpy format in parallel (faster)
Parameters
input_directory: (pathlib.PosixPath object) relative path to the images which should be preprocessed
Returns
output_directory: (pathlib.PosixPath object) relative path for the folder with preprocessed results (tranferred and converted images)
Example Usage
>>>from src.preprocessing import convert_images_to_numpy_format
>>>tumor_folder = Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor')
>>>convert_images_to_numpy_format(tumor_folder)
>>>output:
Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor')
Source code in src/preprocessing.py
def convert_images_to_numpy_format(
input_directory: pathlib.PosixPath,
) -> pathlib.PosixPath:
"""
wrapper function that reduces images in the input folder and converts content of folder to numpy format in parallel (faster)
Parameters
----------
**input_directory**: *(pathlib.PosixPath object)* relative path to the images which should be preprocessed
Returns
------
**output_directory**: *(pathlib.PosixPath object)* relative path for the folder with preprocessed results (tranferred and converted images)
Example Usage
--------------
```python
>>>from src.preprocessing import convert_images_to_numpy_format
>>>tumor_folder = Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor')
>>>convert_images_to_numpy_format(tumor_folder)
>>>output:
Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor')
```
"""
output_directory = create_preprocessing_relat_directory(input_directory)
output_directory.mkdir(parents=True, exist_ok=True)
directory_ok = check_content_of_two_directories(
input_directory, output_directory
)
if directory_ok is False:
if output_directory.exists():
remove_content(output_directory)
parallel_create_numpy_formats_of_input_img = partial(
_create_numpy_formats_of_input_img,
output_directory=output_directory,
)
parallel(
parallel_create_numpy_formats_of_input_img,
sorted(list(input_directory.glob("*.tiff"))),
n_workers=12,
progress=True,
threadpool=True,
)
return output_directory
data_preprocessing_wrapper(data)
High level function that covers preprocessing for all data (blood vessels, tumors, virus). It used the convert_images_to_numpy_format function and applies to each channel (blood vessels, tumors, virus). If you want to preprocess just one channel (e.g. only tumors) use the convert_images_to_numpy_format function.
Parameters
data: (dict) containing keys (names of the channels) and values (relative paths to it).
Returns
preprocessed_data: *(dict) outputs three relative paths for the folders with preprocessed results (tranferred and converted images)
Results are stored on the disk.
Example Usage
>>>from src.preprocessing import data_preprocessing_wrapper
>>>study_paths = {'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/vessel'),
'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor'),
'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/virus')}
>>>data_preprocessing_wrapper(study_paths)
>>>output:
defaultdict(<function src.utils.nested_dict()>,
{'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/vessel'),
'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor'),
'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/virus')})
Source code in src/preprocessing.py
@log_step
def data_preprocessing_wrapper(data: dict) -> dict:
"""
High level function that covers preprocessing for all data (blood vessels, tumors, virus).
It used the convert_images_to_numpy_format function and applies to each channel (blood vessels, tumors, virus).
If you want to preprocess just one channel (e.g. only tumors) use the convert_images_to_numpy_format function.
Parameters
----------
**data**: *(dict)* containing keys (names of the channels) and values (relative paths to it).
Returns
------
**preprocessed_data**: *(dict) outputs three relative paths for the folders with preprocessed results (tranferred and converted images)
Results are stored on the disk.
Example Usage
--------------
```python
>>>from src.preprocessing import data_preprocessing_wrapper
>>>study_paths = {'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/vessel'),
'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/tumor'),
'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/raw/virus')}
>>>data_preprocessing_wrapper(study_paths)
>>>output:
defaultdict(<function src.utils.nested_dict()>,
{'vessel': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/vessel'),
'tumor': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/tumor'),
'virus': Path('ppdm/data/5IT_DUMMY_STUDY/source/transformed/np_and_resized/virus')})
```
"""
preprocessed_data = nested_dict()
out_path = convert_images_to_numpy_format(data["vessel"])
preprocessed_data["vessel"] = out_path
# tumor
out_path = convert_images_to_numpy_format(data["tumor"])
preprocessed_data["tumor"] = out_path
# virus
out_path = convert_images_to_numpy_format(data["virus"])
preprocessed_data["virus"] = out_path
return preprocessed_data
This step will transform and downsize input data (tumor, vessel, virus) .tiff files into python's numpy array files, which will be saved to the output path directory. The new folder structure will look as follows:
ppdm
└─ data
└─ 5IT_STUDY
└─ config.json
└─ source
└─raw
│ └─tumor
│ │ └─ 5IT-4X_Ch2_z0300.tiff
│ │ └─ ...
│ │ └─ 5IT-4X_Ch2_z1300.tiff
│ ├─vessel
│ │ └─ 5IT-4X_Ch3_z0300.tiff
│ │ └─ ...
│ │ └─ 5IT-4X_Ch3_z1300.tiff
│ │─virus
│ └─ 5IT-4X_Ch1_z0300.tiff
│ └─ ...
│ └─5IT-4X_Ch1_z1300.tiff
------------│-------------------------------------------------------
└─transformed
└─ np_and_resized
└─tumor
│ └─ 5IT-4X_Ch2_z0300.np
│ └─ ...
│ └─ 5IT-4X_Ch2_z1300.np
├─vessel
│ └─ 5IT-4X_Ch3_z0300.np
│ └─ ...
│ └─ 5IT-4X_Ch3_z1300.np
│─virus
└─ 5IT-4X_Ch1_z0300.np
└─ ...
└─5IT-4X_Ch1_z1300.np