-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WiP: New OpenACC features for GPUFORT runtime #29
Open
domcharrier
wants to merge
76
commits into
main
Choose a base branch
from
develop-acc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Reason: Need to know all variables in loop kernel body to generate `present_or_\(copy\|copyin\) runtime calls for the vars not appearing in clauses.
*If no default clause is specified, present_or_copy is performed for all unmapped variables *If 'default(none)' is specified and not all variables are mapped, warning is posted (will add option to convert to error) *If 'default(present)' is specified, TODO: Take parent data directive into account to prevent some unnecessary runtime calls (if current behaviour is performance issue).
* vector-add-declare/vector-add.f90 example with declare in program seems to work correctly * Some more care required for enabling declare in subroutines
… into develop-acc
FEATURE: gpufort acc runtime behavior can now be influenced/ tuned via the following environment variables: GPUFORT_LOG_LEVEL (default=0) log level. Maximum log level used in code 3. GPUFORT_MAX_QUEUES (default=64) maximum number of async queues. GPUFORT_INITIAL_RECORD_LIST_CAPACITY (default=4096) mapping records are managed via a vector. Specify the initial vector capacity via this flag. If the maximum capacity is reached, the vector capacity is doubled. GPUFORT_BLOCK_SIZE (default=32) all device arrays are allocated as multiples of this block size. GPUFORT_REUSE_THRESHOLD (default=0.9) reuse an existing device buffer if the requested buffer size greater than GPUFORT_REUSE_THRESHOLD x size of already existing buffer. GPUFORT_NUM_REFS_TO_DEALLOCATE (default=-5) number of references for which a released array will be deallocated OPTIMIZATION: gpufort acc runtime will now try to reuse device arrays that have been previously released but not allocated yet. Behavior can be tuned via env. vars. GPUFORT_REUSE_THRESHOLD, GPUFORT_NUM_REFS_TO_DEALLOCATE and in some sense via GPUFORT_BLOCK_SIZE. BUGFIX/OPTIMIZATION: Lookup records from back.
* add `scope` arg to signature of _intrnl_inout_arrays_in_subtree * Fixes all tests in <gpufort_dir>/python/test/grammar_translator/openacc
* Detects now (additionally) expressions such as ``` <line 1>& !$acc <rest of line 2> ``` and removes the &\s*\n\s*!$acc
FEATURE/linemapper: Allow to prepend and append lines directly to statement data structures and not only whole line. New data structure triggered changes in all dependent packages (scanner,indexer) FEATURE/gpufort: Add option to dump linemapper datastructure
*Fix mismatching arg lists between wrapper/impl function; put long dummy arg list of gpufort_acc_present_... in macro and reuse macro in wrapper (gpufort_acc_runtime) and implementation (gpufort_acc_runtime_base). minor/unrelated: *rename internal function in gpufort.py (parse_cl_args)
* Refactor init/wrap/destroy/copy_to* routines to have blocking memcpy operations; remove stream argument. * Introduce additional init_async/wrap_async/destroy_async/copy_to*_async routines with non-blocking memcpy operations.
domcharrier
changed the title
New OpenACC features for GPUFORT runtime
WiP: New OpenACC features for GPUFORT runtime
Nov 25, 2021
This was
linked to
issues
Nov 25, 2021
… into develop-acc
TODO: Improve test parkour for translator to improve declaration parsing.
… into develop-acc
* Fix issue with parsing expressions that have '=>' in declared variable RHS. * RHS of declared variable can now be logical expression too. * Add more rigorous test for declaration.
This was
linked to
issues
Dec 2, 2021
GPUFORT tries to preserve comments. Unfortunately, this becomes difficult when a comment begins after a line continuation character. GPUFORT will move these comments to the before the statement that contains them.
* Add test to folder python/test/utils
…l-statement WiP: BUGFIX: Support comments in multi-line statements
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
FEATURES:
Initial support for acc declare
OpenACC (gpufort runtime)
default
clause handlingpresent_or_copy
if neitherdefault(none)
nordefault(present)
s specified.Add interoperable GPUFORT array datatype (up to 7 dimensions; autogenerated):
operator()(int i1, int i2, ...)
to support Fortran style array indexingin C++ code. No index macros.
Will be used by GPUFORT to construct interoperable derived types from non-interoperable device types.
This will allow
AoS
syntax such as:domains(5).cells(i).coord_x
in GPUFORT C++ code, which is analogous to the Fortran equivalent:domain(5)%cells(i)%coord_x
.BUGFIXES: