Breast Cancer Image Segmentation

Project progress can be observed here: To-Do-List

Overview

We perform segmentation of medical images to highlight presence of breast cancer.

This project is a part of a course in machine learning operations in the danish technical university. We will be working on a segmentation of medical images to highlight presence of breast cancer. To accomplish this we use the Breast cancer semantic segmentation dataset provided on kaggle. The framework used to train the model is MONAI with the intent of a UNET architecture which is popular for image segmentation in the medical domain.

Data

Our data used is the Breast cancer semantic segmentation dataset specifically the 224x224 image sizes.

The BCSS dataset, derived from TCGA, includes over 20,000 segmentation annotations of breast cancer tissue regions. Annotations are a collaborative effort of pathologists, residents, and medical students, using the Digital Slide Archive

Data version control (DVC)

For remote data version control we use a GCP blob as a data lake since we work with image data we need a file storage instead of a traditional table storage.

Modeling

The framework used to train the model is MONAI, a PyTorch based framework for medical image analysis, that adds a level of abstraction to PyTorch. Instead of defining each layer of our own models, we can instead use entire networks (that are based on scientific papers from the medical research community) and only need to specifiy hyperparameters such as channel sizes, dimensions and loss functions. One such architecture family is UNet. We intend to use a UNET architecture as it is popular for image segmentation. We plan to use a BasicUNet implementation first (based on CNN modules), and later potentially compare the performance to vision transformer based UNET like UNetr (this however is intended for 3D image data, so yet to be clarified).

Containerization

The training procedure is containerized with docker utilizing the CUDA specific docker container for the option of GPU accelerated training.

Project structure

The project structure was initially created using the cookiecutter template for the course machine learning operations course using the template mlops_template.

The directory structure of the project looks like this:

├── Makefile             <- Makefile with convenience commands like `make data` or `make train`
├── README.md            <- The top-level README for developers using this project.
├── data
│   ├── processed        <- The final, canonical data sets for modeling.
│   └── raw              <- The original, immutable data dump.
│
├── docs                 <- Documentation folder
│   │
│   ├── index.md         <- Homepage for your documentation
│   │
│   ├── mkdocs.yml       <- Configuration file for mkdocs
│   │
│   └── source/          <- Source directory for documentation files
│
├── models               <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks            <- Jupyter notebooks.
│
├── pyproject.toml       <- Project configuration file
│
├── reports              <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures          <- Generated graphics and figures to be used in reporting
│
├── requirements.txt     <- The requirements file for reproducing the analysis environment
|
├── requirements_dev.txt <- The requirements file for reproducing the analysis environment
│
├── tests                <- Test files
│
├── project_name  <- Source code for use in this project.
│   │
│   ├── __init__.py      <- Makes folder a Python module
│   │
│   ├── data             <- Scripts to download or generate data
│   │   ├── __init__.py
│   │   └── make_dataset.py
│   │
│   ├── models           <- model implementations, training script and prediction script
│   │   ├── __init__.py
│   │   ├── model.py
│   │
│   ├── visualization    <- Scripts to create exploratory and results oriented visualizations
│   │   ├── __init__.py
│   │   └── visualize.py
│   ├── train_model.py   <- script for training the model
│   └── predict_model.py <- script for predicting from a model
│
└── LICENSE              <- Open-source license if one is chosen

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.dvc		.dvc
.github/workflows		.github/workflows
.idea		.idea
breast_cancer_segmentation		breast_cancer_segmentation
config		config
data		data
dockerfiles		dockerfiles
docs		docs
models		models
notebooks		notebooks
reports		reports
scripts		scripts
tests		tests
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cloudbuild.yml		cloudbuild.yml
coverage.svg		coverage.svg
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast Cancer Image Segmentation

Overview

Data

Data version control (DVC)

Modeling

Containerization

Project structure

About

Releases

Packages

Languages

License

Thomas2710/Breast_Cancer_Segmentation

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Image Segmentation

Overview

Data

Data version control (DVC)

Modeling

Containerization

Project structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages