-This is the the official repository for the paper: "Foundation Model for Cancer Imaging Biomarkers "
+This is the the official repository for the paper
+
+
+
-
+
+
+
+
+
+
+
+
-Suraj Pai, Dennis Bontempi, Ibrahim Hadzic, Vasco Prudente, Mateo Sokač, Tafadzwa L. Chaunzwa, Simon Bernatz, Ahmed Hosny, Raymond H Mak, Nicolai J Birkbak, Hugo JWL Aerts
-
-
[![Build Status](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/actions/workflows/build.yml/badge.svg)](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/actions/workflows/build.yml)
[![Python Version](https://img.shields.io/pypi/pyversions/foundation-cancer-image-biomarker.svg)](https://pypi.org/project/foundation-cancer-image-biomarker/)
[![Dependencies Status](https://img.shields.io/badge/dependencies-up%20to%20date-brightgreen.svg)](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/pulls?utf8=%E2%9C%93&q=is%3Apr%20author%3Aapp%2Fdependabot)
@@ -23,7 +30,7 @@ This is the the official repository for the paper: "Foundation Model for Canc
---
-**NOTE: **
+**NOTE:**
For detailed documentation check our [website](https://aim-harvard.github.io/foundation-cancer-image-biomarker/)
---
\ No newline at end of file
diff --git a/docs/assets/Mhub_image.png b/docs/assets/Mhub_image.png
new file mode 100644
index 0000000..6cb0fcd
Binary files /dev/null and b/docs/assets/Mhub_image.png differ
diff --git a/docs/assets/Mhub_image2.png b/docs/assets/Mhub_image2.png
new file mode 100644
index 0000000..7ae09bc
Binary files /dev/null and b/docs/assets/Mhub_image2.png differ
diff --git a/docs/assets/readpaper_logo.png b/docs/assets/readpaper_logo.png
new file mode 100644
index 0000000..5385bab
Binary files /dev/null and b/docs/assets/readpaper_logo.png differ
diff --git a/docs/index.md b/docs/index.md
index 48a5677..04c6e7e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -3,14 +3,13 @@ hide:
- title
---
#
-
-
-This is the the official documentation for the paper: "Foundation Model for Cancer Imaging Biomarkers "
-
-
-
-
-Suraj Pai, Dennis Bontempi, Ibrahim Hadzic, Vasco Prudente, Mateo Sokač, Tafadzwa L. Chaunzwa, Simon Bernatz, Ahmed Hosny, Raymond H Mak, Nicolai J Birkbak, Hugo JWL Aerts
+
+
+
+
+
+
+
## Documentation Walkthrough
@@ -20,11 +19,11 @@ This is the the official documentation for the paper: "Foundation Model for C
!!! note
[We also provide quickstart examples that run in a free-cloud based environment](./getting-started/cloud-quick-start.md) (through Google Colab) so you can get familiar with our workflows, without having to download anything on your local machine!!
-[Replication Guide](./user-guide/data.md) If you would like to pre-train a foundation model on your own unannotated data or would like to replicate the training and evaluation from our study, see here.
+[Replication Guide](./replication-guide/data.md) If you would like to pre-train a foundation model on your own unannotated data or would like to replicate the training and evaluation from our study, see here.
[Tutorials](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/tree/master/tutorials) We provide comprehensive tutorials that use the foundation model for cancer imaging biomarkers and compare against other popularly used methods. If you would like to build your own study using our foundation model, these set of tutorials are highly recommended as the starting point.
-[API Docs](./api_docs/fmcib/index.html) This is for the more advanced user who would like to deep-dive into different methods and classes provided by our package.
+[API Docs](./reference/run) This is for the more advanced user who would like to deep-dive into different methods and classes provided by our package.
## License
diff --git a/docs/user-guide/analysis.md b/docs/replication-guide/analysis.md
similarity index 100%
rename from docs/user-guide/analysis.md
rename to docs/replication-guide/analysis.md
diff --git a/docs/replication-guide/baselines.md b/docs/replication-guide/baselines.md
new file mode 100644
index 0000000..99d08f0
--- /dev/null
+++ b/docs/replication-guide/baselines.md
@@ -0,0 +1,2 @@
+# Reproduce Baselines
+:hourglass_flowing_sand: Coming soon! :hourglass_flowing_sand:
diff --git a/docs/user-guide/data.md b/docs/replication-guide/data.md
similarity index 98%
rename from docs/user-guide/data.md
rename to docs/replication-guide/data.md
index b509a3c..b220b59 100644
--- a/docs/user-guide/data.md
+++ b/docs/replication-guide/data.md
@@ -62,7 +62,7 @@ bash luna16.sh
The easiest way to download the LUNG1 and RADIO datasets is through s5cmd and [IDC manifests](https://learn.canceridc.dev/data/downloading-data)
For convenience, the manifests for each of the already been provided in `data/download` under `nsclc_radiomics.csv` for LUNG1 and `nsclc_radiogenomics.`
-First, you'll need to install `s5cmd`. Follow the instructions here: https://github.com/peak/s5cmd?tab=readme-ov-file#installation
+First, you'll need to install `s5cmd`. Follow the instructions [here]https://github.com/peak/s5cmd?tab=readme-ov-file#installation
Once you have s5cmd installed, run
diff --git a/docs/user-guide/download_models.md b/docs/replication-guide/download_models.md
similarity index 100%
rename from docs/user-guide/download_models.md
rename to docs/replication-guide/download_models.md
diff --git a/docs/user-guide/fm_adaptation.md b/docs/replication-guide/fm_adaptation.md
similarity index 100%
rename from docs/user-guide/fm_adaptation.md
rename to docs/replication-guide/fm_adaptation.md
diff --git a/docs/user-guide/inference.md b/docs/replication-guide/inference.md
similarity index 100%
rename from docs/user-guide/inference.md
rename to docs/replication-guide/inference.md
diff --git a/docs/user-guide/reproduce_fm.md b/docs/replication-guide/reproduce_fm.md
similarity index 100%
rename from docs/user-guide/reproduce_fm.md
rename to docs/replication-guide/reproduce_fm.md
diff --git a/docs/user-guide/reproduce_baselines.md b/docs/user-guide/reproduce_baselines.md
deleted file mode 100644
index 1372cc3..0000000
--- a/docs/user-guide/reproduce_baselines.md
+++ /dev/null
@@ -1,34 +0,0 @@
-# Reproduce Baselines
-
-### Reproducing our baselines
-
-We have several different baselines that we compare against in this study.
-
-
-As mentioned in [section](#supervised-models), we have three different supervised training implementations. Similar to the foundation pre-training, we use YAML files to maintain the configurations of these implementations.
-
-
- Supervised model trained from random initialization
-
-In order to reproduce this training, you can inspect the YAML configuration at `experiments/baselines/supervised_training/supervised_random_init.yaml`. By default, we configure this for Task 1. You can adapt this for Task 2 and Task 3 by searching for 'Note: ' comments in the YAML that outline what must be changed.
-
-You can start training by running this in the root code folder,
-```bash
-lighter fit --config_file ./experiments/baselines/supervised_training/supervised_random_init.yaml
-```
-
-
-
- Fine-tuning a trained supervised model
-
-The YAML configuration at `experiments/baselines/supervised_training/supervised_finetune.yaml` describes how you can fine-tune an already trained supervised model. Note that this is possible only for Task 2 and Task 3 as we used the supervised model trained in Task 1 to load weights from. Make sure you download the weights for Task 1 supervised models. You can follow instructions [here](#model)
-
-
-You can start training by running this in the root code folder,
-```bash
-lighter fit --config_file ./experiments/baselines/supervised_training/supervised_finetune.yaml
-```
-
-
-
-### Reproducing our linear evaluation (Logistic Regression)
diff --git a/fmcib/visualization/verify_io.py b/fmcib/visualization/verify_io.py
index b5f799f..a45f9fe 100644
--- a/fmcib/visualization/verify_io.py
+++ b/fmcib/visualization/verify_io.py
@@ -17,14 +17,18 @@ def visualize_seed_point(row):
None
"""
# Define the transformation pipeline
+ is_label_provided = "label_path" in row
+ keys = ["image_path", "label_path"] if is_label_provided else ["image_path"]
+ all_keys = keys if is_label_provided else ["image_path", "coordX", "coordY", "coordZ"]
+
T = monai_transforms.Compose(
[
- monai_transforms.LoadImaged(keys=["image_path"], image_only=True, reader="ITKReader"),
- monai_transforms.EnsureChannelFirstd(keys=["image_path"]),
- monai_transforms.Spacingd(keys=["image_path"], pixdim=1, mode="bilinear", align_corners=True, diagonal=True),
+ monai_transforms.LoadImaged(keys=keys, image_only=True, reader="ITKReader"),
+ monai_transforms.EnsureChannelFirstd(keys=keys),
+ monai_transforms.Spacingd(keys=keys, pixdim=1, mode="bilinear", align_corners=True, diagonal=True),
monai_transforms.ScaleIntensityRanged(keys=["image_path"], a_min=-1024, a_max=3072, b_min=0, b_max=1, clip=True),
- monai_transforms.Orientationd(keys=["image_path"], axcodes="LPS"),
- monai_transforms.SelectItemsd(keys=["image_path", "coordX", "coordY", "coordZ"]),
+ monai_transforms.Orientationd(keys=keys, axcodes="LPS"),
+ monai_transforms.SelectItemsd(keys=all_keys),
]
)
@@ -32,30 +36,35 @@ def visualize_seed_point(row):
out = T(row)
# Calculate the center of the image
- center = (-out["coordX"], -out["coordY"], out["coordZ"])
- center = np.linalg.inv(np.array(out["image_path"].affine)) @ np.array(center + (1,))
- center = [int(x) for x in center[:3]]
-
- # Define the image and label
image = out["image_path"]
- label = torch.zeros_like(image)
-
- # Define the dimensions of the image and the patch
- C, H, W, D = image.shape
- Ph, Pw, Pd = 50, 50, 50
-
- # Calculate and clamp the ranges for cropping
- min_h, max_h = max(center[0] - Ph // 2, 0), min(center[0] + Ph // 2, H)
- min_w, max_w = max(center[1] - Pw // 2, 0), min(center[1] + Pw // 2, W)
- min_d, max_d = max(center[2] - Pd // 2, 0), min(center[2] + Pd // 2, D)
-
- # Check if coordinates are valid
- assert min_h < max_h, "Invalid coordinates: min_h >= max_h"
- assert min_w < max_w, "Invalid coordinates: min_w >= max_w"
- assert min_d < max_d, "Invalid coordinates: min_d >= max_d"
-
- # Define the label for the cropped region
- label[:, min_h:max_h, min_w:max_w, min_d:max_d] = 1
+ if not is_label_provided:
+ center = (-out["coordX"], -out["coordY"], out["coordZ"])
+ center = np.linalg.inv(np.array(out["image_path"].affine)) @ np.array(center + (1,))
+ center = [int(x) for x in center[:3]]
+
+ # Define the image and label
+ label = torch.zeros_like(image)
+
+ # Define the dimensions of the image and the patch
+ C, H, W, D = image.shape
+ Ph, Pw, Pd = 50, 50, 50
+
+ # Calculate and clamp the ranges for cropping
+ min_h, max_h = max(center[0] - Ph // 2, 0), min(center[0] + Ph // 2, H)
+ min_w, max_w = max(center[1] - Pw // 2, 0), min(center[1] + Pw // 2, W)
+ min_d, max_d = max(center[2] - Pd // 2, 0), min(center[2] + Pd // 2, D)
+
+ # Check if coordinates are valid
+ assert min_h < max_h, "Invalid coordinates: min_h >= max_h"
+ assert min_w < max_w, "Invalid coordinates: min_w >= max_w"
+ assert min_d < max_d, "Invalid coordinates: min_d >= max_d"
+
+ # Define the label for the cropped region
+ label[:, min_h:max_h, min_w:max_w, min_d:max_d] = 1
+ else:
+ label = out["label_path"]
+ center = torch.nonzero(label).float().mean(dim=0)
+ center = [int(x) for x in center][1:]
# Blend the image and the label
ret = blend_images(image=image, label=label, alpha=0.3, cmap="hsv", rescale_arrays=False)
diff --git a/mkdocs.yml b/mkdocs.yml
index 065c13f..c6b1668 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -39,7 +39,8 @@ plugins:
docstring_style: google
options:
# Removed the default filter that excludes private members (that is, members whose names start with a single underscore).
- filters: null
+ filters: null
+ show_source: true
nav:
- 'index.md'
@@ -48,12 +49,13 @@ nav:
- 'Cloud Quick Start': 'getting-started/cloud-quick-start.md'
- 'Quick Start': 'getting-started/quick-start.md'
- 'Replication Guide':
- - 'Data Download and Preprocessing': 'user-guide/data.md'
- - 'Pre-training the FM': 'user-guide/reproduce_fm.md'
- - 'Adapt the FM to downstream tasks': 'user-guide/fm_adaptation.md'
- - 'Extracting Features & Predictions': 'user-guide/inference.md'
- - 'Reproduce Analysis': 'user-guide/analysis.md'
- # - 'Training baselines': 'user-guide/reproduce_baselines.md'
+ - 'Data Download and Preprocessing': 'replication-guide/data.md'
+ - 'Pre-training the FM': 'replication-guide/reproduce_fm.md'
+ - 'Adapt the FM to downstream tasks': 'replication-guide/fm_adaptation.md'
+ - 'Baselines for downstream tasks': 'replication-guide/baselines.md'
+ - 'Extracting Features & Predictions': 'replication-guide/inference.md'
+ - 'Reproduce Analysis': 'replication-guide/analysis.md'
+ # - 'Training baselines': 'replication-guide/reproduce_baselines.md'
- 'Tutorials': https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/tree/master/tutorials
- 'API Reference': 'reference/'
diff --git a/scripts/generate_api_reference_pages.py b/scripts/generate_api_reference_pages.py
index 98d6ec9..9e7f000 100644
--- a/scripts/generate_api_reference_pages.py
+++ b/scripts/generate_api_reference_pages.py
@@ -18,7 +18,10 @@
root = Path(__file__).parent.parent
src = root / PACKAGE
-for path in sorted(src.rglob("*.py")):
+# Sort files by depth
+paths = sorted(src.rglob("*.py"), key=lambda path: len(path.parts))
+
+for path in paths:
print(f"Processing {path}")
module_path = path.relative_to(src).with_suffix("")
diff --git a/tutorials/get_seed_from_mask.ipynb b/tutorials/get_seed_from_mask.ipynb
new file mode 100644
index 0000000..90166fe
--- /dev/null
+++ b/tutorials/get_seed_from_mask.ipynb
@@ -0,0 +1,233 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Extract CoM seed point from Segmentation Masks\n",
+ "\n",
+ "The FMCIB `get_features` function expects image paths and seed points. If you have segmentations masks you would like to convert to CoM, this notebook provides instructions on how this can be achieved. \n",
+ "\n",
+ "Alternatively, you can use our Mhub https://mhub.ai/models/fmcib_radiomics implementation and use the `nrrd_mask_workflow` as mentioned here: https://github.com/MHubAI/documentation/blob/main/documentation/mhub/run_mhub.md#specify-the-workflow\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import SimpleITK as sitk\n",
+ "from fmcib.utils import download_LUNG1, build_image_seed_dict\n",
+ "from fmcib.visualization import visualize_seed_point"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "First, we download a sample from LUNG1 to show the process of centroid extraction. Use your own data here and skip this step. \n",
+ "\n",
+ "The donwload and conversion will take about a minute. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "download_LUNG1(\"dummy\", samples=1)\n",
+ "build_image_seed_dict(\"dummy\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now we get the path to the image and mask. This can be nii.gz, nrrd, mha or other formats supported by MONAI's ITKReader"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pathlib\n",
+ "\n",
+ "dummy_path = pathlib.Path(\"dummy\")\n",
+ "image_path = list(dummy_path.rglob(\"image.nii.gz\"))[0]\n",
+ "mask_path = list(dummy_path.rglob(\"*GTV-1.nii.gz\"))[0]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "row = {\"image_path\": image_path, \"label_path\": mask_path}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The visualize seed point utility function also visualizes masks when `label_path` is provided as a key in the dict. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ "