Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeatureRequest: Computation Time #308

Open
koellerMC opened this issue Jun 12, 2024 · 4 comments
Open

FeatureRequest: Computation Time #308

koellerMC opened this issue Jun 12, 2024 · 4 comments

Comments

@koellerMC
Copy link

Dear HistoQC Team,

I would have a feature request. Would it be possible to add a computation time metric to each image? As such one could investigate a little bit the compute for the different modules and also evaluate what would be the best settings for large scale application of histoQC (e.g. 10k+ WSIs), or potentially design a staged approach for different quality metrics within a QC pipeline.

All the best!
MK

@jacksonjacobs1
Copy link
Collaborator

Great feedback!

Out of curiosity, how quickly does HistoQC process each image in your use case? We typically see ~30 seconds per image.

On the development side, we typically measure module computation time using a python process profiler such as py-spy

For normal users, it's unclear how performance information would be embedded into the existing HistoQC output. I think the best, simplest option would be to log performance info at the DEBUG level.

In this proposed implementation, DEBUG logs can be forwarded to a .txt file when the user passes the --debug flag (#301 ) to HistoQC.

@choosehappy Your thoughts?

@koellerMC
Copy link
Author

Hi @jacksonjacobs1

So far I have not tracked it. However, once we have done some proper testing I will come back with some metrics.
We will use the DEBUG option as proposed.
Thanks for the fast reply!

BR
MK

@jacksonjacobs1
Copy link
Collaborator

FYI the --debug option does not currently cause performance info to be logged.

@choosehappy
Copy link
Owner

generally speaking, having more performance metrics is likely a good thing, so that folks can make more educated decisions about what modules to include, or where particular hiccups may be. this may even reduce our support overhead if we can enable folks to kind of serve themselves

adding in module level timing should be trivial since the models are called dynamically in a for loop, so simply wrapping that statement in some timing code would easily get the job done

i think the open question for me is similar to the one that jackson brought up -- where/how do we report these metrics in an immediately usable way? if we store it in the wrong format that doesn't directly let folks address their questions and requires reformatting becuase we didn't think about it would be a shame.

at bare minimum i could imagine a few end deliverables: (a) pie chart of compute time breakdown for a single WSI - and maybe this itself is an output in a "timings" module? (b) a pie chart of the compute for all wsi?

as we think about transitioning to using something like a sqllite database this becomes much more obvious how to store (seperate table). perhaps in CSV land, a seperate file makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants