Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk process many slides (e.g. 30k slides) #223

Open
usuyama opened this issue Sep 26, 2022 · 3 comments
Open

Bulk process many slides (e.g. 30k slides) #223

usuyama opened this issue Sep 26, 2022 · 3 comments

Comments

@usuyama
Copy link

usuyama commented Sep 26, 2022

Has anyone tried running HistoQC on PySpark/Databricks?

I'm thinking about ways to run HistoQC on a large dataset like all 30k slides from TCGA.

I guess it should work with some modifications (data-loading/library-versions/etc.), but wonder if anyone in the community has experience.

@choosehappy
Copy link
Owner

choosehappy commented Sep 28, 2022 via email

@ap--
Copy link
Contributor

ap-- commented Feb 9, 2023

I'm thinking about ways to run HistoQC on a large dataset like all 30k slides from TCGA.

A use case like this, was exactly why I started prototyping using HistoQC directly from cloud storage. The idea was to prototype a pipeline for continuous quality monitoring. Since the slide-scanners in this workflow would automatically upload the scanned images to cloud buckets anyways, it made sense to run the QC tests in the cloud too.

Sadly it never got past the poc linked in the fork above, due to lack of time on my end.
But I believe, that once more of the legacy pathology slide formats are supported via tiffslide it becomes a viable option to default to using tiffslide instead of openslide for a cloud native implementation of HistoQC.

Cheers,
Andreas

@choosehappy
Copy link
Owner

Very interesting, thanks for the information!

We're in the process now of hiring someone to do the scalability mentioned above; I could also image a tiffslide integrated version. there are increasingly other formats that we need to be able to support, like dicom, so a generic abstracted approach to loading WSI can address a lot of these points

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants