Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve aggregation and status performance (no conditions). #818

Merged
merged 9 commits into from
Feb 15, 2024

Conversation

joaander
Copy link
Member

@joaander joaander commented Feb 15, 2024

Description

  • Use signac 2.2.0 cached_statepoint in aggregators.
  • Cache the list of job ids while _buffered and avoid filesystem checks during aggregation and status check loops.

This first pass is tested against workflows with no pre/post conditions. I have plans to optimize [pre/post].isfile in a later PR.

Motivation and Context

Significantly improve the performance of common flow operations, especially workspaces with many jobs and aggregates.

Here are benchmark results performed on the Great Lakes scratch filesystem with 100k jobs.
signac 2.2.0, flow main:

Command operation groupsof groupsof_sort groupby groupby_sort
import 0.000204 0.000206 0.000209 0.000212 0.0002
instantiate 0.00537 46.0 177.0 173.0 175.0
status 3.5 45.4 173.0 180.0 179.0
status -j 550457c7c260573b8ecbf... 0.376
status -o operation 3.32 45.8 173.0 173.0 182.0
status -f {"a":{"$gt":-1}} 4.07
run --num-passes=1 15.5 66.9 201.0 203.0 203.0

signac 2.2.0, flow ce37f01

Command operation groupsof groupsof_sort groupby groupby_sort
import 0.000208 0.000206 0.000206 0.000206 0.000202
instantiate 0.00584 0.44 1.17 0.729 0.82
status 2.59 1.57 2.87 0.868 0.967
status -j 550457c7c260573b8ecbf... 0.294
status -o operation 2.43 0.655 1.2 0.912 0.959
status -f {"a":{"$gt":-1}} 3.23
run --num-passes=1 14.5 1.94 2.63 1.11 1.15

All benchmarks run after signac update-cache.

Checklist:

This allows faster `job in project` tests and iteration over jobs.

Also remove some expensive open_job calls and job in project checks that are
not needed while registering aggregates.
…ned.

This saves a small amount of absolute time in projects with no labels. It also gives
the *appearance* of faster status checks as the user sees only 1 progress bar.

Also, hide the "labels" section of the status output when there are no labels to show.
@joaander joaander marked this pull request as ready for review February 15, 2024 16:37
@joaander joaander requested review from a team as code owners February 15, 2024 16:37
@joaander joaander requested review from kidrahahjo and tommy-waltmann and removed request for a team February 15, 2024 16:37
Copy link
Member

@cbkerr cbkerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cbkerr cbkerr merged commit a5648af into main Feb 15, 2024
9 checks passed
@cbkerr cbkerr deleted the improve-performance branch February 15, 2024 18:21
@cbkerr cbkerr added this to the 0.28.0 milestone Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants