Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The new Scheduler #213

Merged
merged 20 commits into from
Mar 22, 2024
Merged

The new Scheduler #213

merged 20 commits into from
Mar 22, 2024

Conversation

reuterbal
Copy link
Collaborator

Draft PR to trigger CI tests for now

Copy link

Documentation for this branch can be viewed at https://sites.ecmwf.int/docs/loki/213/index.html

@reuterbal reuterbal force-pushed the nabr-the-new-scheduler branch 2 times, most recently from 36f2812 to 009f189 Compare January 18, 2024 14:01
Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: Patch coverage is 94.35084% with 57 lines in your changes are missing coverage. Please review.

Project coverage is 92.78%. Comparing base (3fe6ce2) to head (1e03830).

Files Patch % Lines
loki/bulk/item.py 93.22% 32 Missing ⚠️
loki/bulk/scheduler.py 95.81% 12 Missing ⚠️
loki/bulk/configure.py 94.84% 5 Missing ⚠️
loki/program_unit.py 78.57% 3 Missing ⚠️
loki/sourcefile.py 86.66% 2 Missing ⚠️
loki/transform/build_system_transform.py 33.33% 2 Missing ⚠️
transformations/transformations/data_offload.py 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #213      +/-   ##
==========================================
+ Coverage   92.53%   92.78%   +0.25%     
==========================================
  Files          96       96              
  Lines       17580    17965     +385     
==========================================
+ Hits        16267    16669     +402     
+ Misses       1313     1296      -17     
Flag Coverage Δ
lint_rules 96.39% <ø> (ø)
loki 92.77% <94.34%> (+0.30%) ⬆️
transformations 91.94% <94.44%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reuterbal reuterbal force-pushed the nabr-the-new-scheduler branch from 009f189 to 8c9b280 Compare January 18, 2024 14:55
@reuterbal reuterbal force-pushed the nabr-the-new-scheduler branch from 0647d2e to cd7085d Compare January 31, 2024 19:31
@reuterbal reuterbal force-pushed the nabr-the-new-scheduler branch from 8e7d368 to e0601ca Compare February 12, 2024 20:23
@reuterbal
Copy link
Collaborator Author

Just a little heads-up: I force-pushed after rebasing over latest main, to make this branch compatible with GlobalVarHoist and the switch to convert for F2C processing.
This branch includes also two temporary commits to point to different ecwam/cloudsc branches - these should be removed before (squash) merging the new Scheduler

@reuterbal reuterbal force-pushed the nabr-the-new-scheduler branch 2 times, most recently from 8c505db to cdf3441 Compare February 22, 2024 12:24
@reuterbal
Copy link
Collaborator Author

This has been rebased again to integrate the changes from #222 and other PRs that were merged in the meantime. I also took the opportunity to squash the lengthy commit history and only retain the (to be removed) CLOUDSC and ecWAM branch redirection and the temporary skip of OMNI for CLOUDSC regression.

@reuterbal reuterbal marked this pull request as ready for review February 22, 2024 13:06
"""
Create and return the :any:`SGraph` constructed from the :attr:`seeds` of the Scheduler.
"""
return SGraph.from_seed(self.seeds, self.item_factory, self.config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, this automatic re-generation of the SGraph is very expensive at the moment, even though it does seem to correctly use the caching mechanism provided by the ItemFactory. My suspicion is the use of Item.dependencies triggering a complete re-parse of each edge/node of the base graph via the REGEX frontend, as this always seems to trigger Item.concretize_dependencies.

In this current stage this effect makes prohibitively expensive (enrichment triggers about 6 rebuilds for me at around 40s each in ec-physics), which can be mitigated by not caching the sgraph. However, I wonder if there is a way to not "concretize" things that are already concrete, and if this would alleviate the problem somewhat?

Copy link
Collaborator Author

@reuterbal reuterbal Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might be right!
The intention is to call make_complete to ensure the relevant parser classes have been parsed already - which should be a no-op after the initial discovery. For that reason, there is this bail-out here:

loki/loki/program_unit.py

Lines 290 to 291 in b101847

if self._parser_classes == parser_classes:
return

Or at least, that's what it is intended to be. We call make_complete only with the Item._depends_class, which ideally will be a subset of what has been parsed already. But that's not what we are checking for... If my half-asleep brain is not completely wrong, this line should instead read as:

            if self._parser_classes & parser_classes:

Would you mind trialling this?

"""
return self.config.get('expand', False)
if not (dependencies := self.dependencies):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm not 100% that I'm understanding this right, but to illustrate my point on expensive SGraph rebuilds, I believe this lines is triggered for determining every edge in the base graph, and will cause a regex-parse of every dependency item (via concretize_dependencies), regardless of whether this has been done before. I've not fully confirmed this, other than by observing that this call can take up to 2s for certain items.

If this is the case, I'm not super-sure how to fix this, but it feels like this should be fixable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for now, follow-up tracked via #254 .

WIP: Towards SGraph

Expose defined IR nodes in Module and Subroutine

Incremental parsing stubs via Item.definitions

Initial sources for batch processing tests

Link items to parser classes and name dependencies/definitions

Expose global variables declared in a module

Introduce an empty RegexParserClass

First draft of on-demand regex logic and graph build

Homogenize access to typedefs and imports in program units

regex: parse kind in declarations

On-demand graph building improved and more tests

Add a rudimentary SGraph implementation

Refactored item creation

Enhance the SchedulerConfig with accessors

Enhance Item creation with config and support for `disable`

Expand disable testing

Some documentation on new items

Support for ignore and block in SGraph

Fix typos and Linter warnings

Resilience against items that cannot be found in non-strict discovery

Python 3.8 compatibility

Introduce file graph for SGraph

Additional testing for nested typebound procedures

WIP: Replace Scheduler callgraph by SGraph and SFilter

Update linter test for module dependencies in scheduler

Apply new enrichment procedure in scheduler

Restore targets functionality

Skip intrinsic modules

Restore depths functionality

Adapt to new Scheduler API

Consistent use of bind_names for procedure bindings

Support for inline call dependencies

New, generic enrichment process

On-the-fly creation of SGraph

Pool Allocator compliance

Item renaming

post-rebase fixes

wildcard disable in sgraph

WIP: Support for Item renaming

Crude support for cross-library ignores

Get Dependency transformations to work

Retire GlobalVariableItem

WIP: Semantics for Item rename/create and dependency transformation fixes

GlobalVarImportTransformation tests updated for Scheduler rewrite

Fix successors for "edge" nodes

Fix targets pruning in Sgraph

Update CMake test with header dependency

Lift item config into ItemConfig base class

Pass ignore list instead of flag

Split get_procedure_binding_item from get_procedure_item

Use IR properties instead of bespoke Item properties/methods

Overload creation of definitions for FileItem

Remove redundant ProcedureBindingItem create step

Make ProcedureItem and ProcedureBindingItem creating classmethods

Convert create_from_ir to classmethod

Move item creation to ItemFactory

Move item_cache into ItemFactory

Remove cached_property for Item.scope

Remove redundant dependency injection mechanism

Documentation for the new scheduler

pylint compliance

Remove some remnants of SubroutineItem

Expand config value matching and apply correct tree pruning

Fix typos

Rename SubroutineItem to ProcedureItem in recently changed code

Enable source sanitizer for incremental parse via from_source

Add a test to verify that interface items depend on procedures

Filter `items` kwarg to include only SGraph items in file_graph traversals

Remove inactive items in DependencyTransformation

Make GlobalVarHoistTrafo compatible with new scheduler

F2C Trafo: Create a copy for C kernel

Fix C kernel generation in F2C trafo

Fix test_transpile_expression

Add ignored items to dependency graph and skip during processing
Copy link
Collaborator

@mlange05 mlange05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, first of all this is an impressive piece of live-wire engineering, and the improvements from the SGraph, and the incremental parsing infrastructure are very powerful! Obviously, this type of major change is too large to do a detailed line-by-line review, but having played with it for a little while, I'm confident that any remaining issues can be addressed in follow-up PRs. So, I've left a couple of docstring pointers, but they are truly minor, so GTG from me! :shipit:

And just to re-iterate, this is a major breaking change, so integration strategy and corresponding point-release management should happen as discuss offline (roll minor point release first, merge this, increment major point version number in quick follow-up release.

And again, many thanks @reuterbal for a truly impressive feature! 😁

functions that can be readily used in a step of the transformation pipeline:
Most transformations, however, will only require modifying those parts of a file
that are part of the call tree that is to be transformed to avoid unexpected
side-effects. For that reason,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sentence not quite finished?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but don't remember where I was going with this...

making sure that routines with the relevant :any:`CallStatement` are always
processed before their target :class:`Subroutine`.
:any:`Transformation` applies this transformation to all files, modules, or
routines that are appear in the dependency graph. The exact traversal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that are appear

"""
return self.config.get('expand', False)
if not (dependencies := self.dependencies):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok for now, follow-up tracked via #254 .

@reuterbal
Copy link
Collaborator Author

Documentation updated, will now rebase to remove pointers to temporary regression test branches, and then merge to main.

@reuterbal reuterbal force-pushed the nabr-the-new-scheduler branch from 5dbd1fb to 1e03830 Compare March 22, 2024 10:07
@reuterbal reuterbal merged commit 338a042 into main Mar 22, 2024
8 of 12 checks passed
@reuterbal reuterbal deleted the nabr-the-new-scheduler branch March 22, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants