Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient Outside A Map Scope is Mapped to GPU #5

Open
ThrudPrimrose opened this issue Oct 14, 2024 · 1 comment
Open

Transient Outside A Map Scope is Mapped to GPU #5

ThrudPrimrose opened this issue Oct 14, 2024 · 1 comment
Assignees

Comments

@ThrudPrimrose
Copy link
Owner

Transient scalars are always mapped to GPU storage, even if it is in a Default map that is not mapped to GPU.

I can prevent a map from being offloaded to a GPU by setting it as a "host_map". It is a small feature I have added. Roughly at line in gpu_transform_sdfg.py 315-225, I can prevent a Default map from being mapped to GPU (GPU_Device) by using an additional variable on the Map Node:

...
elif isinstance(node, nodes.EntryNode):
    if not isinstance(node, nodes.MapEntry) or not node.map.host_map:
        node.schedule = dtypes.ScheduleType.GPU_Device
        gpu_nodes.add((state, node))
...

Note: This change is not present in the latest commit.

However, I can't get it not to put the transient Scalar to host storage, even when it is in a map that is a "host_map".

I will attach an SDFG to reproduce the behaviour: (Rm .json at the end github does not support sdfgz format)
cut_2.sdfgz.json

We have two chains that have a (tasklets->access node) pattern, initializing data containers (writing to access nodes). Even if I put these chains within a trivial map or not "levmask" variable is always mapped to GPU_transient storage.

Running:
sdfg.apply_gpu_transformations(validate = True, validate_all = True, permissive = True, sequential_innermaps=True, register_transients=False, simplify=False)
on this map, results with invalid code because levmask is mapped to GPU_Global storage, but tasklet is on host.
To reproduce download the SDFG and run this script:

from dace.sdfg.sdfg import SDFG

sdfg = SDFG.from_file("cut_2.sdfgz")

try:
    sdfg.apply_gpu_transformations(validate = True, validate_all = True, permissive = True, sequential_innermaps=True, register_transients=False, simplify=False)
    sdfg.validate()
except Exception as e:
    raise Exception(e)
finally:
    sdfg.save("cut_2_to_gpu_2.sdfgz")

If I encircle in a map, then there is no problem. If I encircle this tasklet with a map, but then decided that this map should stay on host - it still maps the "levmask" to GPU_Global storage with the map schedule on CPU. TO reproduce you can sue this script:

from dace.sdfg.sdfg import SDFG
from dace.transformation.icon.map_over_tasklet_access_node_tasklet import MapOverTaskletAccessNodeTaskelet
from dace.transformation.icon.force_on_host import ForceOnHost

sdfg = SDFG.from_file("cut_2.sdfgz")


sdfg.apply_transformations_repeated(
    MapOverTaskletAccessNodeTaskelet,
    validate=False,
    validate_all=True)
sdfg.save("cut_2_preprocessed_1.sdfgz")
sdfg.apply_transformations_repeated(
    ForceOnHost,
    options={"access_names":["levmask"]},
    validate=True,
    validate_all=True)
sdfg.save("cut_2_preprocessed_2.sdfgz")

try:
    sdfg.apply_gpu_transformations(validate = True, validate_all = True, permissive = True, sequential_innermaps=True, register_transients=False, simplify=False)
    sdfg.validate()
except Exception as e:
    raise Exception(e)
finally:
    sdfg.save("cut_2_to_gpu_2.sdfgz")

This script puts a trivial map around the (tasklet -> access node) chain, and then sets any map that reads from or writes to a data container named "levmask" as a host map. The SDFG looks as follows:
cut_2_preprocessed_2 .sdfgz.json

@ThrudPrimrose ThrudPrimrose self-assigned this Oct 14, 2024
@ThrudPrimrose
Copy link
Owner Author

I can fix my issue by introducing a host_map and host_data fields and setting them explicitly.
This prevents the issue for my use case.

But still, transients outside map scopes should not be mapped to the GPU I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants