Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial stab at actx.trace_call #210

Closed
wants to merge 6 commits into from
Closed

Conversation

MTCam
Copy link
Contributor

@MTCam MTCam commented Dec 13, 2022

Augments inducer/pytato/#364 by adding array context version of trace_call that handles array container types.

cc: @lukeolson

# {{{ utilities


def _ary_container_key_stringifier(keys: Tuple[Any, ...]) -> str:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't these functions already exist in impl.pytato.compile? What has changed?

@@ -520,6 +520,12 @@ def compile(self, f: Callable[..., Any]) -> Callable[..., Any]:
"""
return f

# Supporting interface for function/call tracing in actx implementations
def trace_call(self, f: Callable[..., Any],
*args, identifier=None, **kwargs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return type annotation. More generally, trace_call really should permit the user to do two things:

  • Use the result of the traced call.
  • Call the trace result using new data.

I'm not sure we'll be able to do both with just a single return value.

@@ -520,6 +520,12 @@ def compile(self, f: Callable[..., Any]) -> Callable[..., Any]:
"""
return f

# Supporting interface for function/call tracing in actx implementations
def trace_call(self, f: Callable[..., Any],
*args, identifier=None, **kwargs):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also want the interface to apply tags to the call site.

@kaushikcfd
Copy link
Collaborator

kaushikcfd commented Mar 8, 2023

Is this something that is still being worked on? If yes, I can help with some stuff here, I have some bandwidth for it this week.

If not, then what's the plan for addressing illinois-ceesd/mirgecom#777? (I don't think compilation times of those orders should be tolerated)

@MTCam
Copy link
Contributor Author

MTCam commented Mar 11, 2023

Is this something that is still being worked on? If yes, I can help with some stuff here, I have some bandwidth for it this week.

I pushed and updated my latest on this. My apologies for the state of it; it is very much wip.

I had a fork/branch of meshmode with a simple test of using the actx-provided trace_call.
https://github.com/MTCam/meshmode/tree/use-actx-trace-call

@kaushikcfd
Copy link
Collaborator

I pushed and updated my latest on this. My apologies for the state of it; it is very much wip.

Thanks! If you want, I can take over and maybe it will be faster that way? My ETA for this is the weekend. However, if you think you want to get accustomed to these parts of the codebase, I'm happy to step away.

@MTCam
Copy link
Contributor Author

MTCam commented Mar 11, 2023

If not, then what's the plan for addressing illinois-ceesd/mirgecom#777? (I don't think compilation times of those orders should be tolerated)

The compilation times seem well under control for our current prediction I blv. We need to run some tests at scale to make sure that this is the case, but currently the build times are hovering around less than an hour for high order 3D with all physics and features enabled on 128 ranks. That is well into the prediction-enabling realm, and also not a giant bottleneck. The build time for serial is around 600-1000s, and 3000+ for many ranks.

I understand the current situation is brought about mostly by mitigation and not a permanent solution. 1-dimensional domain decomposition is currently limiting our max number of MPI neighbors to 2, minimizing the impact of the splatted flux dags for partition boundaries. It does seem like, for our problem and geometry, we could stick with that strategy indefinitely.

Solving dag splat I think will help us avoid splatting the flux calculation for each unique domain boundary. That would be useful.

@MTCam
Copy link
Contributor Author

MTCam commented Mar 11, 2023

I pushed and updated my latest on this. My apologies for the state of it; it is very much wip.

Thanks! If you want, I can take over and maybe it will be faster that way? My ETA for this is the weekend. However, if you think you want to get accustomed to these parts of the codebase, I'm happy to step away.

I would be delighted for you to take over and am very interested in following what happens here. I would get back to this after the review, but won't be anywhere near as efficient at getting it going. I believe it is close to functioning as a demo - but not close to final form.

@kaushikcfd
Copy link
Collaborator

but currently the build times are hovering around less than an hour for high order 3D with all physics and features enabled on 128 ranks

Compilation times of 1 hour (even 20 minutes tbh) are very bad and something that must be fixed ASAP. Another problem that is hovering around is that this also leads to a poorly generated device code, as each sub-array from a neighboring rank is assigned a small-ish buffer, and launching GPU kernels for such small problem sizes brings down the throughput.

I would be delighted for you to take over and am very interested in following what happens here.

Yep, let's do this then.

@kaushikcfd
Copy link
Collaborator

See #221.

@kaushikcfd
Copy link
Collaborator

kaushikcfd commented Mar 14, 2023

In #221, I added a test under examples/, which demonstrates the intended purpose as:

(py311_env) [line@line examples]$ python how_to_outline.py
[Pre-concatenation] Number of nodes = 32701
[Post-concatenation] Number of nodes = 3710

I think we can close this PR.

@inducer inducer closed this Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants