Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusion of the carbon footprint of cached processes #62

Open
Llannelongue opened this issue Oct 17, 2023 · 1 comment
Open

Inclusion of the carbon footprint of cached processes #62

Llannelongue opened this issue Oct 17, 2023 · 1 comment

Comments

@Llannelongue
Copy link
Collaborator

This is to discuss how to present the carbon footprint of cached processes.

Simple example: 3 processes [P1] [P2] [P3]

  • First run of the pipeline, all three are run
  • Second run of the pipeline, cached [P1] is used, [P2] and [P3] are run

When presenting the carbon footprint of run (2), we can either:

  • Take the carbon footprint of [P1] for run (1) and add the carbon footprints of [P2] and [P3] from run (2).
  • Or only add the carbon footprints of [P2] and [P3] and ignore the impact of [P1].

Option 1 gives a better estimate of the total carbon footprint of the pipeline if we were to run it again from start to finish on new data let's say. But option 2 gives a more accurate estimate of the true carbon footprint of running step (2). And if adding run (1) + run (2), option 2 should be used (otherwise the footprint of [P1] would get double counted even though it was only run once).

It seems to depend a lot on what users want to do with this information, so perhaps best to give both information in the report so that users can decide what to do?

@skrakau
Copy link
Collaborator

skrakau commented Oct 17, 2023

Yes, I also think both would be useful in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants