Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DaCe VRAM pooling #295

Merged
merged 134 commits into from
Aug 29, 2022
Merged

DaCe VRAM pooling #295

merged 134 commits into from
Aug 29, 2022

Conversation

FlorianDeconinck
Copy link
Contributor

@FlorianDeconinck FlorianDeconinck commented Aug 23, 2022

Purpose

Add VRAM pooling by moving all Persistent arrays in Transient withing the DaCe pipeline then use the new DaCe auto-pool of transients

Code changes:

  • Orchestration pipeline change
  • Added tooling (NaN checker, memory count) of SDFG
  • Deactivating distributed caching due to potential bug with Grid

Requirements changes:

  • python -m driver.tools tooling
  • DaCe now >= 0.14

Infrastructure changes:

  • CUDA_CRAY_MPS needs to be deactivated because mallocAsync fails when turned on

Checklist

  • You have followed the coding standards guidelines established at Code Review Checklist.
  • Docstrings and type hints are added to new and updated routines, as appropriate
  • All relevant documentation has been updated or added (e.g. README, CONTRIBUTING docs)

Prevent patch class to be badly renamed
Make DaCeOrchestration.Run resilient to no communicator, defaulting to `.gt_cache`
Change conftest to adjust for changes in DaceConfig reqs
Remove custom lazy compile
Restore proper restart config save
Fix OOB with passing origin to the __sdfg__
Orchestrate dyncore: delnflux
Remove unused parameter in Remap
Orchestrate: dyn_core
Fix translate parallel test comm passing to dace_config
Bad fix for multi-process yaml load
@FlorianDeconinck FlorianDeconinck marked this pull request as ready for review August 26, 2022 14:20
@FlorianDeconinck
Copy link
Contributor Author

launch jenkins

4 similar comments
@FlorianDeconinck
Copy link
Contributor Author

launch jenkins

@jdahm
Copy link
Contributor

jdahm commented Aug 26, 2022

launch jenkins

@jdahm
Copy link
Contributor

jdahm commented Aug 26, 2022

launch jenkins

@jdahm
Copy link
Contributor

jdahm commented Aug 26, 2022

launch jenkins

@FlorianDeconinck FlorianDeconinck enabled auto-merge (squash) August 29, 2022 13:26
@@ -34,7 +34,7 @@ attrs==21.2.0
# pytest
babel==2.9.1
# via sphinx
backports.entry-points-selectable==1.1.1
backports-entry-points-selectable==1.1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why this changed?

Copy link
Contributor Author

@FlorianDeconinck FlorianDeconinck Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-_o_-

Comment on lines -79 to -80
cmake==3.22.4
# via dace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this dependency added back anywhere. Was that removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-_o_-
I am guessing DaCe changed their dependency tree

dsl/pace/dsl/dace/utils.py Outdated Show resolved Hide resolved
f"\t {detail.name}\n"
)

return report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: it would be more flexible to separate the counting memory from the report generation, i.e. split this into two functions. The first would create something like [(name, size)] for an SDFG, then the next one would the english representation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am prepping another PR that renames that tool and add a kernel timing analysis. I'll break it up there

Comment on lines +158 to +164
memory_pooled = 0.0
for _sd, _aname, arr in sdfg.arrays_recursive():
if arr.lifetime == dace.AllocationLifetime.Persistent:
arr.pool = True
memory_pooled += arr.total_size * arr.dtype.bytes
arr.lifetime = dace.AllocationLifetime.Scope
memory_pooled = float(memory_pooled) / (1024 * 1024)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct that this is the moment when the arrays are switched to using pooled memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. DaCe will automatically, at code generation, pool all Scoped arrays flagged. So we swap persistent arrays (e.g. arrays in sub-SDFG not passed as parameters to the top SDFG) into scope, and flag them.
Actual pooling is done at code gen time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am clarifying in the comment

@FlorianDeconinck FlorianDeconinck merged commit 2bfefc5 into main Aug 29, 2022
@FlorianDeconinck FlorianDeconinck deleted the dace_transient_pooled branch August 29, 2022 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants