Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaCe VRAM pooling #295

Merged
merged 134 commits into from
Aug 29, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
134 commits
Select commit Hold shift + click to select a range
df4c245
Allow for env var to control orchestration
FlorianDeconinck Jun 14, 2022
afd41a0
(some) Translate tests
FlorianDeconinck Jun 14, 2022
03a3ba4
Failing flllz orchestration
FlorianDeconinck Jun 14, 2022
b9f9146
(Re)Orchestrate remapping
FlorianDeconinck Jun 15, 2022
1b79998
Fix orchestrate for new DaCe
FlorianDeconinck Jun 16, 2022
9250283
Removing extra guard irrelevant since load_once is gone
FlorianDeconinck Jun 16, 2022
a4939d7
Correct type hint & return
FlorianDeconinck Jun 16, 2022
c0e326c
Use lazy_stencil when orchestrating
FlorianDeconinck Jun 17, 2022
65f3164
Making sure lazy_stencil doesn't trigger before __call__
FlorianDeconinck Jun 17, 2022
97eac2d
Remove the need to cache Communicator in dace_config
FlorianDeconinck Jun 17, 2022
2680451
Fixing communicator removed from DaceConfig
FlorianDeconinck Jun 17, 2022
257cb06
Merge branch 'DaceConfig_RemoveComm' into reorchestrate_all_modules
FlorianDeconinck Jun 17, 2022
ac1aec5
Integrate LazyStencil.field_info fix
FlorianDeconinck Jun 20, 2022
4394d99
Remove unused _frozen_stencil() in stencil
FlorianDeconinck Jun 20, 2022
01b40e3
Add domain to stencil __sdfg__
FlorianDeconinck Jun 20, 2022
43eca42
Orchestrate tracers
FlorianDeconinck Jun 21, 2022
c119b93
Orchestrate: microphysics (minus driver)
FlorianDeconinck Jun 21, 2022
9d06b5f
Merge branch 'orchestrate_on_AOT_stencils' into reorchestrate_all_mod…
FlorianDeconinck Jun 22, 2022
31934be
Change orchestration build pipe to more efficient trf passes
FlorianDeconinck Jun 22, 2022
faf18c0
Minor
FlorianDeconinck Jun 22, 2022
7d1f3d2
Orchestrate: fv_dynamics
FlorianDeconinck Jun 22, 2022
4f721d1
Minor
FlorianDeconinck Jun 22, 2022
7882901
Updating dace.conf
FlorianDeconinck Jun 23, 2022
4e582f0
gitignore; DaCe & test
FlorianDeconinck Jun 23, 2022
53e8f03
Verbose
FlorianDeconinck Jun 23, 2022
15f6d8e
Boolean logic be hard
FlorianDeconinck Jun 23, 2022
25e5bac
Removing dace.conf and replacing it with direct call to conf API
FlorianDeconinck Jun 23, 2022
da6e9d2
Restrict dace.config setup to orchestration
FlorianDeconinck Jun 23, 2022
2798f8f
Orchestration: FV_Dynamics
FlorianDeconinck Jun 23, 2022
72e9ae2
Move parsing in orchestration to commong fn, time.
FlorianDeconinck Jun 27, 2022
7fd16aa
Use_cache on SDFG gen & verbose print
FlorianDeconinck Jun 27, 2022
d3d4f32
Orchestration: driver
FlorianDeconinck Jun 27, 2022
eabd66b
Driver example: c12 baroclinic orchestration on CPU
FlorianDeconinck Jun 27, 2022
7bc7978
Merge remote-tracking branch 'origin/main' into reorchestrate_all_mod…
FlorianDeconinck Jun 27, 2022
76d5771
Linting
FlorianDeconinck Jun 27, 2022
d0199a7
More linting
FlorianDeconinck Jun 27, 2022
cc28498
Revert cache us on sdfg parse
FlorianDeconinck Jun 27, 2022
429f495
Fix timestep computation. Move in Config
FlorianDeconinck Jun 28, 2022
929ce00
Modify Physics call structure to allow for DaCe parsing limit
FlorianDeconinck Jun 29, 2022
6364776
Fix rank read in build.py logging
FlorianDeconinck Jun 29, 2022
2ed46da
Fix log_on_rank_0 on MPI
FlorianDeconinck Jun 29, 2022
b8c0f8c
Bypass parsing when not the rank that should be compiling
FlorianDeconinck Jun 29, 2022
8f3d4e8
Linting
FlorianDeconinck Jun 30, 2022
252f371
Fix test_fv3core
FlorianDeconinck Jun 30, 2022
fc03f2b
Rename module orchestrate to orchestration
FlorianDeconinck Jun 30, 2022
1cd473d
Swap .layout/decomposition for a post-build write up & runtime check
FlorianDeconinck Jun 30, 2022
3758d44
Fix dace_config save in restart
FlorianDeconinck Jun 30, 2022
8bf16d9
Lint
FlorianDeconinck Jun 30, 2022
80fb2a8
Merge branch 'main' into reorchestrate_all_modules
FlorianDeconinck Jun 30, 2022
707935e
Typo
FlorianDeconinck Jun 30, 2022
838a18d
lint
FlorianDeconinck Jun 30, 2022
cda9cda
Fix dace_config serialization for Restart
FlorianDeconinck Jun 30, 2022
fc6b30a
dace_config is optional
FlorianDeconinck Jun 30, 2022
ef13fda
lint
FlorianDeconinck Jun 30, 2022
2e9b2ef
Update `dace` version
FlorianDeconinck Jun 30, 2022
b0d3ecd
Update constraints.txt
FlorianDeconinck Jun 30, 2022
52b1165
Guard against degenerate behavior for FV3_DACEMODE
FlorianDeconinck Jul 5, 2022
1f551c6
Merge remote-tracking branch 'origin/reorchestrate_all_modules' into …
FlorianDeconinck Jul 5, 2022
8a06268
PR notes - verbosing behavior
FlorianDeconinck Jul 7, 2022
f0c7591
Driver performance critical function renamed & verbosed
FlorianDeconinck Jul 8, 2022
ff10723
Merge branch 'main' into reorchestrate_all_modules
FlorianDeconinck Jul 8, 2022
49a55e1
Extend write/verification of the build_info
FlorianDeconinck Jul 8, 2022
797f11d
Missing serialization field
FlorianDeconinck Jul 8, 2022
0b6aa3f
Fix build_info file
FlorianDeconinck Jul 12, 2022
2b8d503
Small fix
FlorianDeconinck Jul 12, 2022
737003a
Small fix
FlorianDeconinck Jul 12, 2022
35e5e1c
SDFG count RAM/VRAM
FlorianDeconinck Jul 12, 2022
3a662f1
Fix cmd line
FlorianDeconinck Jul 13, 2022
854d819
Update Pace code to DaCe v0.14 RC (TBR)
FlorianDeconinck Jul 13, 2022
2e96bf9
Update DaCe to 0.14rc1
FlorianDeconinck Jul 13, 2022
5dc3eae
Fix SDFG load on distributed cache
FlorianDeconinck Jul 13, 2022
0ac1090
Microphysics: move setup computation on proper device
FlorianDeconinck Jul 13, 2022
24b36ff
update gt4py to branch
Jul 13, 2022
b3a366d
Merge branch 'reorchestrate_all_modules' into dace_auto_RAM_read
FlorianDeconinck Jul 14, 2022
b11cb1a
Add per file and in-memory options
FlorianDeconinck Jul 14, 2022
8ee1ed7
Re-insert performance collection after each timestep
FlorianDeconinck Jul 14, 2022
3f1684a
Fix microphysics setup computation on proper Host/Device
FlorianDeconinck Jul 15, 2022
608fa90
lint
FlorianDeconinck Jul 15, 2022
4c57da9
Fix to the ContextLib orchestration
FlorianDeconinck Jul 15, 2022
3ef571a
Merge branch 'main' into reorchestrate_all_modules
FlorianDeconinck Jul 15, 2022
234368e
Merge branch 'reorchestrate_all_modules' into dace_auto_RAM_read
FlorianDeconinck Jul 15, 2022
2e0f449
Detail reporting
FlorianDeconinck Jul 16, 2022
2ebe9e8
Fix command line
FlorianDeconinck Jul 20, 2022
e4281b5
Do not instantiate Physics if you are not going to run it
FlorianDeconinck Jul 22, 2022
06ab595
Add debug tools
FlorianDeconinck Jul 22, 2022
29fe391
Drivre: Fix timestep, fix end_of_step_actions for orchestration
FlorianDeconinck Jul 24, 2022
daeb8d4
DaCe orchestrated: proper blocking size, comiple for newer target SM
FlorianDeconinck Jul 24, 2022
7078866
Merge branch 'main' into stable_orchestration
FlorianDeconinck Jul 24, 2022
d7dd1b8
Lint
FlorianDeconinck Jul 24, 2022
1e12120
Tweaking report to display orchestrated
FlorianDeconinck Jul 24, 2022
b1914f7
Merge branch 'stable_orchestration' into dace_auto_RAM_read
FlorianDeconinck Jul 24, 2022
fee465f
LINT
FlorianDeconinck Jul 25, 2022
915e19e
Added static analysis to end of build
FlorianDeconinck Jul 25, 2022
04f9458
Verbose
FlorianDeconinck Jul 25, 2022
ea3e311
NaN Check: removing unused code, auto schedule type, verbose
FlorianDeconinck Jul 25, 2022
29bd9d3
Pass down transient flag
FlorianDeconinck Jul 26, 2022
4b988e7
Merge branch 'main' into stable_orchestration
FlorianDeconinck Jul 26, 2022
57c1037
Update dace to v0.14rc2
FlorianDeconinck Jul 26, 2022
0cbff4c
Make constraints.txt happy
FlorianDeconinck Jul 26, 2022
5e2a35f
Remove dace constraints -> per PIP requirements workaround
FlorianDeconinck Jul 26, 2022
310c45e
Merge branch 'stable_orchestration' into dace_transient_pooled
FlorianDeconinck Jul 26, 2022
7d75ace
Orchestration: pool persistent mem
FlorianDeconinck Jul 27, 2022
b98c64b
Merge branch 'dace_auto_RAM_read' into dace_transient_pooled
FlorianDeconinck Jul 27, 2022
4a087a3
Remove -e from constraints.txt per PIP
FlorianDeconinck Jul 27, 2022
1467a41
Dace config: query dace syncdebug
FlorianDeconinck Jul 28, 2022
a7b3393
Merge branch 'main' into stable_orchestration
FlorianDeconinck Aug 1, 2022
401906c
Move gt4py & own reference to DaCe to rc2 capable
FlorianDeconinck Aug 1, 2022
4bde297
GT4Py dace versionning relaxed constraints
FlorianDeconinck Aug 1, 2022
70f1032
Add Dace requirements to the driver
FlorianDeconinck Aug 1, 2022
4283e5a
Make `daint` pre-install dace to go around new PIP behavior of refusi…
FlorianDeconinck Aug 8, 2022
46b1d85
Move dace install up (?) on daint
FlorianDeconinck Aug 8, 2022
fc10613
Copy changes to the other (sic) install_virtualenv, for more env fun
FlorianDeconinck Aug 9, 2022
299f327
typo
FlorianDeconinck Aug 9, 2022
ffeacb0
Merge branch 'stable_orchestration' into dace_transient_pooled
FlorianDeconinck Aug 18, 2022
5ecf71a
Merge remote-tracking branch 'origin/main' into stable_orchestration
FlorianDeconinck Aug 18, 2022
64b9dec
DaCe config: remoce check_args (not needed & unsupported in new DaCe)
FlorianDeconinck Aug 19, 2022
c05127a
Merge branch 'stable_orchestration' into dace_transient_pooled
FlorianDeconinck Aug 19, 2022
772bf71
Lint
FlorianDeconinck Aug 19, 2022
cc19793
Merge branch 'main' into dace_transient_pooled
FlorianDeconinck Aug 23, 2022
ce0dfed
Fix merge errors
FlorianDeconinck Aug 23, 2022
b0e62c3
Remove orch .yaml example
FlorianDeconinck Aug 23, 2022
3b3713d
Use logger instead print
FlorianDeconinck Aug 23, 2022
ba15084
Lint
FlorianDeconinck Aug 23, 2022
bb6cadb
Merge remote-tracking branch 'origin/HEAD' into dace_transient_pooled
FlorianDeconinck Aug 23, 2022
836a064
Fix logging
FlorianDeconinck Aug 24, 2022
41f8a92
Merge remote-tracking branch 'origin/main' into dace_transient_pooled
FlorianDeconinck Aug 24, 2022
4642f85
Merge remote-tracking branch 'origin/main' into dace_transient_pooled
FlorianDeconinck Aug 24, 2022
b8398e9
Deactivate distributed compile
FlorianDeconinck Aug 24, 2022
f8c0d9d
Move flag to dace_config
FlorianDeconinck Aug 24, 2022
8287dbc
dace >= 0.14 for VRAM pooling
FlorianDeconinck Aug 26, 2022
f27e449
Remove -e from constraint.txt + lint
FlorianDeconinck Aug 26, 2022
fe29275
Make CUDA timer safe for non-CUDA context
FlorianDeconinck Aug 26, 2022
761ade2
Reuse GPU availability code & optional import
FlorianDeconinck Aug 26, 2022
5d18530
Cleanup & PR notes
FlorianDeconinck Aug 29, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 41 additions & 42 deletions constraints.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# This file is autogenerated by pip-compile
# This file is autogenerated by pip-compile with python 3.8
# To update, run:
#
# pip-compile --output-file=constraints.txt driver/requirements.txt dsl/requirements.txt external/gt4py/setup.cfg fv3core/requirements.txt fv3gfs-physics/requirements.txt pace-util/requirements.txt requirements_dev.txt requirements_docs.txt requirements_lint.txt
Expand Down Expand Up @@ -34,7 +34,7 @@ attrs==21.2.0
# pytest
babel==2.9.1
# via sphinx
backports.entry-points-selectable==1.1.1
backports-entry-points-selectable==1.1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why this changed?

Copy link
Contributor Author

@FlorianDeconinck FlorianDeconinck Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-_o_-

# via virtualenv
black==22.3.0
# via
Expand Down Expand Up @@ -76,8 +76,6 @@ click==8.0.1
# pip-tools
cloudpickle==2.0.0
# via dask
cmake==3.22.4
# via dace
Comment on lines -79 to -80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this dependency added back anywhere. Was that removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-_o_-
I am guessing DaCe changed their dependency tree

commonmark==0.9.1
# via recommonmark
coverage==5.5
Expand All @@ -88,6 +86,13 @@ cytoolz==0.11.2
# via
# gt4py
# gt4py (external/gt4py/setup.cfg)
dace==0.14
# via
# -r driver/requirements.txt
# -r dsl/requirements.txt
# -r fv3core/requirements/requirements_dace.txt
# -r requirements_dev.txt
# pace-dsl
dacite==1.6.0
# via
# -r driver/requirements.txt
Expand All @@ -105,8 +110,6 @@ dill==0.3.5.1
# via dace
distlib==0.3.2
# via virtualenv
distro==1.7.0
# via scikit-build
docutils==0.16
# via
# recommonmark
Expand Down Expand Up @@ -163,15 +166,15 @@ google-api-core==2.0.0
# via
# google-cloud-core
# google-cloud-storage
google-auth-oauthlib==0.4.5
# via gcsfs
google-auth==2.0.1
# via
# gcsfs
# google-api-core
# google-auth-oauthlib
# google-cloud-core
# google-cloud-storage
google-auth-oauthlib==0.4.5
# via gcsfs
google-cloud-core==2.0.0
# via google-cloud-storage
google-cloud-storage==1.42.0
Expand Down Expand Up @@ -238,15 +241,15 @@ multidict==5.1.0
# via
# aiohttp
# yarl
mypy==0.790
# via
# -r fv3gfs-physics/requirements.txt
# -r pace-util/requirements.txt
mypy-extensions==0.4.3
# via
# black
# mypy
# typing-inspect
mypy==0.790
# via
# -r fv3gfs-physics/requirements.txt
# -r pace-util/requirements.txt
netcdf4==1.5.7
# via
# -r driver/requirements.txt
Expand Down Expand Up @@ -292,7 +295,6 @@ packaging==21.0
# gt4py
# gt4py (external/gt4py/setup.cfg)
# pytest
# scikit-build
# sphinx
# tox
pandas==1.3.2
Expand Down Expand Up @@ -326,12 +328,12 @@ py==1.10.0
# pytest
# pytest-forked
# tox
pyasn1-modules==0.2.8
# via google-auth
pyasn1==0.4.8
# via
# pyasn1-modules
# rsa
pyasn1-modules==0.2.8
# via google-auth
pybind11==2.8.1
# via
# gt4py
Expand All @@ -350,6 +352,21 @@ pygments==2.10.0
# via sphinx
pyparsing==2.4.7
# via packaging
pytest==6.2.4
# via
# -r driver/requirements.txt
# -r fv3core/requirements/requirements_base.txt
# -r requirements_dev.txt
# pytest-cache
# pytest-cov
# pytest-datadir
# pytest-dependency
# pytest-factoryboy
# pytest-forked
# pytest-profiling
# pytest-regressions
# pytest-subtests
# pytest-xdist
pytest-cache==1.0
# via -r fv3core/requirements/requirements_base.txt
pytest-cov==2.12.1
Expand Down Expand Up @@ -378,21 +395,6 @@ pytest-subtests==0.5.0
# -r requirements_dev.txt
pytest-xdist==2.3.0
# via -r fv3core/requirements/requirements_base.txt
pytest==6.2.4
# via
# -r driver/requirements.txt
# -r fv3core/requirements/requirements_base.txt
# -r requirements_dev.txt
# pytest-cache
# pytest-cov
# pytest-datadir
# pytest-dependency
# pytest-factoryboy
# pytest-forked
# pytest-profiling
# pytest-regressions
# pytest-subtests
# pytest-xdist
python-dateutil==2.8.2
# via
# faker
Expand All @@ -411,8 +413,6 @@ pyyaml==5.4.1
# pytest-regressions
recommonmark==0.7.1
# via -r requirements_docs.txt
requests-oauthlib==1.3.0
# via google-auth-oauthlib
requests==2.26.0
# via
# dace
Expand All @@ -421,10 +421,10 @@ requests==2.26.0
# google-cloud-storage
# requests-oauthlib
# sphinx
requests-oauthlib==1.3.0
# via google-auth-oauthlib
rsa==4.7.2
# via google-auth
scikit-build==0.15.0
# via dace
scipy==1.7.1
# via
# -r fv3core/requirements/requirements_base.txt
Expand All @@ -447,6 +447,13 @@ snowballstemmer==2.1.0
# via sphinx
sortedcontainers==2.4.0
# via hypothesis
sphinx==4.1.2
# via
# -r requirements_docs.txt
# recommonmark
# sphinx-argparse
# sphinx-gallery
# sphinx-rtd-theme
sphinx-argparse==0.3.1
# via -r requirements_docs.txt
sphinx-gallery==0.10.1
Expand All @@ -455,13 +462,6 @@ sphinx-rtd-theme==0.5.2
# via
# -r pace-util/requirements.txt
# -r requirements_docs.txt
sphinx==4.1.2
# via
# -r requirements_docs.txt
# recommonmark
# sphinx-argparse
# sphinx-gallery
# sphinx-rtd-theme
sphinxcontrib-applehelp==1.0.2
# via sphinx
sphinxcontrib-devhelp==1.0.2
Expand Down Expand Up @@ -533,7 +533,6 @@ wheel==0.37.0
# -r pace-util/requirements.txt
# astunparse
# pip-tools
# scikit-build
xarray==0.19.0
# via
# -r driver/requirements.txt
Expand Down
35 changes: 35 additions & 0 deletions driver/pace/driver/tools.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import os
from typing import Optional

import click

from pace.dsl.dace.utils import count_memory_from_path


# Count the memory from a given SDFG
ACTION_SDFG_MEMORY_COUNT = "sdfg_memory_count"


@click.command()
@click.argument(
"action",
required=True,
type=click.Choice([ACTION_SDFG_MEMORY_COUNT]),
)
@click.option(
"--sdfg_path",
type=click.STRING,
)
@click.option("--report_detail", is_flag=True, type=click.BOOL, default=False)
def command_line(action: str, sdfg_path: Optional[str], report_detail: Optional[bool]):
"""
Run tooling.
"""
if action == ACTION_SDFG_MEMORY_COUNT:
if sdfg_path is None or not os.path.exists(sdfg_path):
raise RuntimeError(f"Can't load SDFG {sdfg_path}")
print(count_memory_from_path(sdfg_path, detail_report=report_detail))


if __name__ == "__main__":
command_line()
2 changes: 1 addition & 1 deletion driver/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ numpy
netCDF4
xarray
zarr
git+https://github.com/spcl/dace[email protected]
dace>=0.14
1 change: 1 addition & 0 deletions dsl/pace/dsl/dace/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ def get_sdfg_path(
"""
import os

# TODO: check DaceConfig for cache.strategy == name
# Guarding against bad usage of this function
if config.get_orchestrate() != DaCeOrchestration.Run:
return None
Expand Down
16 changes: 15 additions & 1 deletion dsl/pace/dsl/dace/dace_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
from pace.util.communicator import CubedSphereCommunicator


# TODO (floriand): Temporary deactivate the distributed compiled
# until we deal with the Grid data inlining during orchestration
# See github issue #301
DEACTIVATE_DISTRIBUTED_DACE_COMPILE = True


class DaCeOrchestration(enum.Enum):
"""
Orchestration mode for DaCe
Expand Down Expand Up @@ -139,7 +145,12 @@ def __init__(
if communicator:
self.my_rank = communicator.rank
self.rank_size = communicator.comm.Get_size()
self.target_rank = get_target_rank(self.my_rank, communicator.partitioner)
if DEACTIVATE_DISTRIBUTED_DACE_COMPILE:
self.target_rank = communicator.rank
else:
self.target_rank = get_target_rank(
self.my_rank, communicator.partitioner
)
self.layout = communicator.partitioner.layout
else:
self.my_rank = 0
Expand Down Expand Up @@ -170,6 +181,9 @@ def get_backend(self) -> str:
def get_orchestrate(self) -> DaCeOrchestration:
return self._orchestrate

def get_sync_debug(self) -> bool:
return dace.config.Config.get("compiler", "cuda", "syncdebug")

def as_dict(self) -> Dict[str, Any]:
return {
"_orchestrate": str(self._orchestrate.name),
Expand Down
Loading