-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Changing data held in element to accessor data #129
Open
bremerm31
wants to merge
65
commits into
UT-CHG:master
Choose a base branch
from
bremerm31:SoA
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adding type trait to detect whether a class is a struct of arrays container. Such a class must have 1. A typdef name `Accessor` 2. A member function `at()`, which accepts an `unsigned int` and returns an `Accessor`
Adding a function to extract members at each `index`. In the case of a vector of array of structs this will return Accessors for each array of structs - Adding missing include in `use_blaze.hpp`
We needed to add support for arrays of struct of arrays. Since accessors tend not to be default constructible `index_sequence`s to enable default construction. Various allocator issues were fixed for the vectors of vectors and structs of arrays. Updated `test_at_each.cpp` accordingly.
Additionally, since rows are not default constructible, I added a helper function for getting an array of rows from an array of matrices. `test_linear_algebra.cpp` updated to test functionality.
- Results match to full precision with master for manufactured solution - Unit tests updated accordingly
Various discrepancies between eigen and blaze rows require work arounds here.
This is the first commit to move the entire hierarchy towards something where we have containers that contain both data in a struct of arrays layout and accessors.
In order for an efficient API, we need both the problem data, and element data struct of arrays to be accessible in one struct. This will imply that we will set up a second class hierarchy mimicing the current element mesh hierarchy. From here, vectorized functions will be called directly on the SoAs.
- Adding missing function definitions for 1D legendre basis
By removing this flipping, we will aim to vectorize more of the interface kernel evaluations, and avoid awkward memory loads and stores.
This requires significant changes to LLF flux API and resultingly changed files.
- thread_local was not playing nicely with the `double` instantiation of `LLF_flux`. Simply replaced with an overload - Error in accessing surface normals in `test/test_boundary_interface.cpp`
`is_vectorized<T>` returns `std::true_type` if `T` has a constexpr boolean member `is_vectorized` that is set to true. Otherwise, is_vectorized returns false type.
- Unitialized values were causing valgrind to come up not clean in `rkdg_swe_data_state.hpp` - Error in Runge-Kutta update, solutions were being swapped in the state variable loop - `thread_local` duration specifier was also causing issues. Removing for now. Will test performance on skylake nodes
- L2 Errors match now
This reverts commit 308b398.
This reverts commit 43be065.
…ssors - Starting to update how integration is done.
Some weirdness in using of block matrices, where the expression template requires explicit allocation or else it segfaults.
- Developed dg-micro-benchmarks which have been used to find optimal data storage layouts - This has sped up the volume kernel by a factor of 3x from the baseline
Adding container and vestiges of SoA class to mesh. Mesh types updated accordingly
- Each interface container will have a pointer to the `ElementContainers` for here using for all functionality we will be able to generate the sparse matrices for computing UgpIn and UgpEx as well as the integrations - `ElementSoA` was updated so that `BoundaryData` is now included (various reserve functions required modification)
- `is_vectorized` in functor is now a `constexpr` function, which accepts a template argument. This allows us to dispatch function calls based on Element or interface type.
This is the beginning of the vectorization of the interface kernel. We have outlined the join SoA/Accessor data type, and slightly modified the interface kernel. Difficulties are arising due to strong typing of elements requires the entire element SoA to be subsumed into the pointer class.
- First correct implementation of vectorized interface kernel - System implements a similar dispatch system based on vectorized function objects - Three outstanding fixmes: 1. ComputeUgpBdry and integration routines need to only occur once This may also allow for partial vectorization of non-vectorized interfaces (and potentially boundaries as well. 2. FMAs need to be incorporated into numerical flux 3. shrinkToFit() in interface kernel is causing memory leaks
Gcc doesn't correctly implement `_mm512_abs_pd` as of gcc/8.1.0 and we are thus using `max(a,-a)` as the intrinsic to get around this.
- To allow for optimal memory use/utilizing vectorization all data layouts relating to the interface kernel have been moved to a Column Major layout - This gives an additional speed-up of 1.5x for the interface kernel - Total speed-up relative to last commit is 1.24 - Total speed-up relative to baseline (point of forking) is 2.12
Updated distributed boundaries with new SoA dispatch system. Numerically verified for OMPI runs
- using `DynMatrices` causes stack overflows for HPX
- this should replace a complexity of O(n^2) with O(n) (on average)
Done to potentially avoid overhead associated with multithreaded MPI
Initializing sparse matrices was being done very inefficiently. To rememdy this we changed initializations of all `CompressedMatrices`. This included sections in `element_soa.hpp` and `interface_soa.hpp`. Initialization appears to be scaling linearly on the number of elements based on studies run. | number of elements | Time (in seconds) | |--------------------|-------------------| | 65k | 1.9 | | 262k | 8.1 | | 1049k | 32.0 |
Starting to refactor code support SoA layout - note that there remain errors with solving linear systems. - Will need to cherry pick bc9d45a
When solving linear systems using blaze, the matrix must be column order otherwise. Blaze with solve `A^T X = B`. - Add overload to solve systems with row major storage - Explicitly make `delta_hat_global` column major for performance
With 6b861b0, EHDG-SWE works in serial. |
Both OMPI and HPX parallelizations match the serial implementation to full numerical precision.
Code compiles, however we have not been able to verify correctness. As a note to self, potential areas of concern remain: - use of gp_ex
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
To optimize cache usage and partially enable vectorization, this PR converts the data from an array of structs layout (AoS) to a struct of arrays layout (SoA). To maintain the ease of use associated with the AoS layout, we are introducing an
Accessor
class, which will serve as a reference to the data associated with a given element.