Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge dev to main #2

Merged
merged 30 commits into from
Aug 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
5c97648
remove unused code; add a test
afrendeiro Mar 15, 2024
b4b94a8
add missing system-level dependency to github action
afrendeiro Mar 15, 2024
0df6810
avoid use internal Path for deprecation; set gh-action to 3.10 only
afrendeiro Mar 15, 2024
26a6cbb
add missing requests dependency
afrendeiro Mar 15, 2024
465542b
add missing scikit-image dependency
afrendeiro Mar 15, 2024
1563e09
add missing tqdm dependency
afrendeiro Mar 15, 2024
36403f2
add missing dependencies
afrendeiro Mar 15, 2024
176877a
coalesce all utils in wsi/utils.py; less exposed methods; more docstr…
afrendeiro Mar 15, 2024
813e6ff
remove internal utils.Path
afrendeiro Mar 29, 2024
f8fbf1d
better interface for WholeSlideImage.save_tile_images
afrendeiro Mar 29, 2024
c46f586
better devel
afrendeiro Mar 29, 2024
cbacdfe
fix bug using holes_tissue not paired with contours_tissue
afrendeiro Mar 29, 2024
778409d
relax test tolerance
afrendeiro Mar 29, 2024
9688405
rename Whole_Slide_Bag_FP class to WholeSlideBag
afrendeiro Apr 16, 2024
21e0c44
more docstrings
afrendeiro Apr 16, 2024
d491058
add internal _get_best_level
afrendeiro Apr 16, 2024
4b3c547
save segmentation to hdf5_file rather than pickler; remove functions …
afrendeiro Apr 16, 2024
6d8d5e7
expose parameters of _segment_tissue_manual to user; set broader defa…
afrendeiro Apr 29, 2024
0153240
handle edge cases of tissue foreground touching edges in _segment_tis…
afrendeiro Apr 29, 2024
64dbcb9
revert to use of skimage.measure.find_contours instead of through ski…
afrendeiro Apr 30, 2024
3143e27
make sure contours of all holes are correctly extracted
afrendeiro Apr 30, 2024
55d9f84
Add functionality to retrieve tile polygons and graphs
afrendeiro Jun 5, 2024
88bc342
Add function to get a slide thumbnail
afrendeiro Jun 5, 2024
c97d9b7
fix bug getting contours
afrendeiro Jun 5, 2024
e714b9f
improve docs; more customization of kwargs in inference
afrendeiro Jun 5, 2024
71961a7
add option to plot each tissue piece separately in plot_segmentation
afrendeiro Jun 28, 2024
262616e
enable loading of legacy tissue contours saved as pickle
afrendeiro Jul 4, 2024
9e3b866
fix wrong WholeSlideBag init
afrendeiro Jul 4, 2024
0a220aa
simplify signature of .inference()
afrendeiro Jul 8, 2024
33facab
fix tests
afrendeiro Jul 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/pytest_workflow.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Pytest testing

on: [push]

jobs:
test:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y openslide-tools

- name: Install Python dependencies
run: |
python -m pip install --upgrade pip
pip install .
- name: Test with pytest
run: |
pip install pytest pytest-cov
pytest wsi --doctest-modules --junitxml=junit/test-results.xml --cov=com --cov-report=xml --cov-report=html
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,13 @@
__pycache__/
*.egg-info
build/
dist
.coverage
cache
junit
joblib
__pycache__
.mypy_cache
coverage.xml
_version.py
*.sublime-*
24 changes: 24 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
clean:
-rm -rf build
-rm -rf dist
-rm -rf *.egg-info
-rm -rf .coverage
-rm -rf cache
-rm -rf junit
-rm -rf joblib
-rm -rf __pycache__
-rm -rf .mypy_cache
-rm -rf htmlcov
-rm coverage.xml
# -rm -rf .pytest_cache

test: clean
pytest wsi \
--doctest-modules \
--junitxml=junit/test-results.xml \
--cov=wsi \
--cov-report=xml \
--cov-report=html

install: clean
python -m pip install -e .
53 changes: 41 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CLAM
WSI
====
This is a fork of the repository from [Mahmood lab's CLAM repository](https://github.com/mahmoodlab/CLAM).
It is made available under the GPLv3 License and is available for non-commercial academic purposes.
Expand All @@ -8,7 +8,7 @@ It is made available under the GPLv3 License and is available for non-commercial

The purpose of the fork is to compartimentalize the features related with processing of whole-slide images (WSI) from the CLAM model.

The package has been renamed to `wsi_core` as that was the name of the module related with whole slide image processing.
The package has been renamed to `wsi`.


## Installation
Expand All @@ -17,24 +17,40 @@ While the repository is private, make sure you [exchange SSH keys of the machine

Then simply install with `pip`:
```bash
git clone [email protected]:rendeirolab/CLAM.git
cd CLAM
# pip install git+ssh://[email protected]:rendeirolab/wsi.git
git clone [email protected]:rendeirolab/wsi.git
cd wsi
pip install .
```

Note that the package uses setuptols-scm for version control and therefore the installation source needs to be a git repository (a zip file of source code won't work).

## Usage

The only exposed class is `WholeSlideImage` enables all the functionalities of the package.

### Quick start - segmentation, tiling and feature extraction
```python
from wsi import WholeSlideImage

url = "https://brd.nci.nih.gov/brd/imagedownload/GTEX-O5YU-1426"
slide = WholeSlideImage(url)
slide.segment()
slide.tile()
feats, coords = slide.inference("resnet18")
```

### Full example

This package is meant for both interactive use and for use in a pipeline at scale.
By default actions do not return anything, but instead save the results to disk in files relative to the slide file.

All major functions have sensible defaults but allow for customization.
Please check the docstring of each function for more information.

```python
from wsi_core import WholeSlideImage
from wsi_core.utils import Path
from wsi import WholeSlideImage
from wsi.utils import Path

# Get example slide image
slide_file = Path("GTEX-12ZZW-2726.svs")
Expand All @@ -48,7 +64,7 @@ if not slide_file.exists():
# Instantiate slide object
slide = WholeSlideImage(slide_file)

# Instantiate slide object
# Instantiation can be done with custom attributes
slide = WholeSlideImage(slide_file, attributes=dict(donor="GTEX-12ZZW"))

# Segment tissue (segmentation mask is stored as polygons in slide.contours_tissue)
Expand All @@ -75,15 +91,28 @@ for img in images:
slide.save_tile_images(output_dir=slide_file.parent / (slide_file.stem + "_tiles"))

# Use in a torch dataloader
loader = slide.as_data_loader()
loader = slide.as_data_loader(with_coords=True)

# Extract features
# Extract features "manually"
import torch
from tqdm import tqdm
model = torch.hub.load("pytorch/vision", "resnet50", pretrained=True)
for count, (batch, coords) in tqdm(enumerate(loader), total=len(loader)):
model = torch.hub.load("pytorch/vision", "resnet18", weights="DEFAULT")
feats = list()
coords = list()
for count, (batch, yx) in tqdm(enumerate(loader), total=len(loader)):
with torch.no_grad():
features = model(batch).numpy()
f = model(batch).numpy()
feats.append(f)
coords.append(yx)

feats = np.concatenate(feats, axis=0)
coords = np.concatenate(coords, axis=0)

# Extract features "automatically"
feats, coords = slide.inference('resnet18')

# Additional parameters can also be specified
feats, coords = slide.inference('resnet18', device='cuda', data_loader_kws=dict(batch_size=512))
```

## Reference
Expand Down
29 changes: 18 additions & 11 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# PIP, using PEP621
[project]
name = "wsi_core"
name = "wsi"
authors = [
{name = "Andre Rendeiro", email = "[email protected]"},
]
Expand All @@ -11,8 +11,8 @@ keywords = [
]
classifiers = [
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Development Status :: 3 - Alpha",
"Typing :: Typed",
"License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)",
Expand All @@ -21,14 +21,21 @@ classifiers = [
#license = "gpt3"
requires-python = ">=3.10"
dependencies = [
"opencv-python",
"h5py",
"matplotlib",
"numpy",
"opencv-python",
"openslide-python",
"pandas",
"Pillow",
"requests",
"scikit-image",
"scikit-learn",
"scipy",
"shapely",
"torch",
"torchvision",
"tqdm",
]
dynamic = ['version']

Expand All @@ -51,9 +58,9 @@ doc = [
]

[project.urls]
homepage = "https://github.com/rendeirolab/CLAM"
documentation = "https://github.com/rendeirolab/CLAM/blob/main/README.md"
repository = "https://github.com/rendeirolab/CLAM"
homepage = "https://github.com/rendeirolab/wsi"
documentation = "https://github.com/rendeirolab/wsi/blob/main/README.md"
repository = "https://github.com/rendeirolab/wsi"

[build-system]
# requires = ["poetry>=0.12", "setuptools>=45", "wheel", "poetry-dynamic-versioning"]
Expand All @@ -62,7 +69,7 @@ requires = ["setuptools>=45", "wheel", "setuptools_scm[toml]>=6.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools_scm]
write_to = "wsi_core/_version.py"
write_to = "wsi/_version.py"
write_to_template = 'version = __version__ = "{version}"'

[tool.black]
Expand Down Expand Up @@ -104,7 +111,7 @@ module = [
'matplotlib.*',
'networkx.*',
#
'wsi_core.*'
'wsi.*'
]
ignore_missing_imports = true

Expand All @@ -117,5 +124,5 @@ testpaths = [
]
markers = [
'slow', # 'marks tests as slow (deselect with "-m 'not slow'")',
'serial'
]
"wsi"
]
9 changes: 8 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
opencv-python
h5py
matplotlib
numpy
opencv-python
openslide-python
pandas
Pillow
requests
scikit-image
scikit-learn
scipy
shapely
torch
torchvision
tqdm
2 changes: 2 additions & 0 deletions wsi/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .wsi import WholeSlideImage
from ._version import version, __version__
44 changes: 44 additions & 0 deletions wsi/tests/test_wsi.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from pathlib import Path
import tempfile
import joblib

import requests
import pytest
from wsi import WholeSlideImage
import numpy as np


mem = joblib.Memory("cache", verbose=0)


@pytest.fixture(scope="session")
@mem.cache
def get_test_slide():
slide_file = Path("GTEX-O5YU-1426.svs")
if not slide_file.exists():
url = f"https://brd.nci.nih.gov/brd/imagedownload/{slide_file.stem}"
slide_file = Path(tempfile.NamedTemporaryFile(suffix=".svs").name)

with open(slide_file, "wb") as file:
for chunk in requests.get(url, stream=True).iter_content(chunk_size=1024 * 4):
file.write(chunk)
else:
for f in sorted(Path().glob(slide_file.stem + "*")):
if f != slide_file:
f.unlink()
return slide_file


@pytest.mark.wsi
@pytest.mark.slow
def test_whole_slide_image_inference(get_test_slide):
slide = WholeSlideImage(get_test_slide)
slide.segment()
assert len(slide.contours_tissue) == len(slide.holes_tissue)
slide.tile()
feats, coords = slide.inference("resnet18")

# Assert conditions
assert coords.shape == (654, 2), "Coords shape mismatch"
print(feats.sum())
assert np.allclose(feats.sum(), 14.555267, atol=1e-3), "Features sum mismatch"
Loading
Loading