-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk process many slides (e.g. 30k slides) #223
Comments
Interesting! As far as i know not yet!
That said, we were just awarded some additional funding specifically for
scaling up our histotools suite (histoqc.com, patchsorter.com,
quickannotator.com, cohortfinder.com)
I think we'll end up using the Ray distributed computing framework, which
is now sufficiently mature for this sort of thing :)
There has been some work with HistoQC for reading files from cloud storage:
https://github.com/ap--/HistoQC/tree/feature/cloud-support-via-tiffslide#accessing-cloud-storage
if you have any thoughts/comments, I would love to hear them!
…On Mon, Sep 26, 2022 at 6:02 AM Naoto Usuyama ***@***.***> wrote:
Has anyone tried running HistoQC on PySpark/Databricks?
I'm thinking about ways to run HistoQC on a large dataset like all 30k
slides from TCGA.
I guess it should work with some modifications
(data-loading/library-versions/etc.), but wonder if anyone in the community
has experience.
—
Reply to this email directly, view it on GitHub
<#223>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJ3XTFNT6KJLJSUI4F7CB3WAEN6RANCNFSM6AAAAAAQVMWGSE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
A use case like this, was exactly why I started prototyping using HistoQC directly from cloud storage. The idea was to prototype a pipeline for continuous quality monitoring. Since the slide-scanners in this workflow would automatically upload the scanned images to cloud buckets anyways, it made sense to run the QC tests in the cloud too. Sadly it never got past the poc linked in the fork above, due to lack of time on my end. Cheers, |
Very interesting, thanks for the information! We're in the process now of hiring someone to do the scalability mentioned above; I could also image a tiffslide integrated version. there are increasingly other formats that we need to be able to support, like dicom, so a generic abstracted approach to loading WSI can address a lot of these points |
Has anyone tried running HistoQC on PySpark/Databricks?
I'm thinking about ways to run HistoQC on a large dataset like all 30k slides from TCGA.
I guess it should work with some modifications (data-loading/library-versions/etc.), but wonder if anyone in the community has experience.
The text was updated successfully, but these errors were encountered: