-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding Up FDTD Engine By an Improved Multi-dimensional Arrays Implementation "ArrayLib" with Contiguous Memory #161
Conversation
78b1e6a
to
deffa35
Compare
I did not yet have the time to check out this massive code-changes. But looking at these comments sounds to be that this is a compromise to not change to much but by using a wrapper class? What would be the downside from "going all the way at once"? Would we still need this or a wrapper? Or could C++23 handle it all? Would it really be this bad? Afterall the engine interfaces already supply simple access anywhere outside the engine? Would it not be possible inside the engine itself to use simple linear array access (pointer math) without any need for a wrapper? |
Another question I have is, why did you also change the basic/simple Operator&Engine? Why not keep it as simple and basic as it gets even if that is known to be slower and only do this advancements for the SSE+multi-threading operator&engines? |
Would/have you considered to "just" create an all new engine? At first using only your new interface and eventually using maybe AVX and your new tiling? An argument against this would be that we already have too many operators&engines... |
I don't think it's a massive change. The old Pull Request was the real massive one. For example, compare the change in
The old Pull Request did it this way - it changes all array syntax to the new-style
Three reasons. The first is that the
Fully replacing the current multi-threading engine is problematic:
So in no sense it's a drop-in replacement, it's only an opt-in option for users who can get the benefits. One idea, perhaps my long-term plan, is to reimplement FDTD entirely as a shared library. The high performance engine exists fully independent from the existing one. This way many different engines can be added, such as a GPU engine, a MPI engine, a new Tiling engine, etc. This way, the new high-performance engine will not be limited to the structure of the existing classes. At the same time, there's minimum disruption to existing users. |
Question: Does not any access to e.g. the 4D array cause a linearIndex call with multiple multiplications and additions? (I'm aware that this was also the case before). Now that we have a nice linear array in memory would it not make sense for the engine to calculate the linear index once and re-use that index e.g. inside the engine loops? Could there be a function to retrieve the linear array directly and access it directly? |
My opinion is simple. The current implementation may not be the most optimal one, but it has already demonstrated a performance improvement based on measurement data. If anyone (a contributor or possibly myself in the future) believes one can improve performance merely reusing indices, they are free to implement this optimization, but this must be justified using as much data as I did here. Personally, from my tests, I don't see any performance benefit of reusing these pre-computed indices for now. It has an overhead but it's masked by memory access overhead anyway on most machines.
Yes, it's already possible now. Like the standard C++ programming convention, one can use the
Thus, using a precomputed index can be done via something like the following...
Three kinds of syntax are supported:
The first and second styles have similar performance, since nested |
One more question and this is just me trying to hopefully fully understand this new array approach soon: The GetVV method could be written like this right?:
would be a bit shorter and easier? as the current:
BUT: no need to change, just me asking, testing and understanding... In any case, I think I'm done with my review. How should I merge this? I think personally I would prefer c over b over a. Because these are a lot of commits and it looks to me as there is a some back and forth and I did the code review as a single change anyhow... But since I'm asking I could obviously live with any of the three... Please let me know what you would prefer... And thanks again for your work on this... |
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Motivation ------------- Originally, multi-dimensional arrays in the simulation engine were allocated as multi-level C pointers. For example, the following code allocates and accesses a 3D array: // allocate 3D arrays as pointers of pointers T*** array=NULL; array = new T**[SIZE_I] for (size_t i = 0; i < SIZE_I; i++) { array[i] = new T*[SIZE_J] for (size_t j = 0; j < SIZE_J; j++) { array[i][j] = new T[SIZE_K]; for (size_t k = 0; k < SIZE_K; k++) { // access array via C-style syntax array[i][j][k] = 0; } } } The advantage is that all multi-dimensional arrays can be accessed like a C-style array. But with several downsides: 1. A single array requires multiple `malloc()` or `new` allocation. 2. The allocated memory is not contiguous, reducing memory locality and performance, especially when the final dimension is small. According to the best practice today (e.g. `std::mdspan`), these raw pointers should be converted to wrapped arrays classes with their internal indexing code, and they should be accessed using the following syntax: array(i, j, k) = 0; array[i, j, k] = 0; // C++23 But this requires massive change of all existing array indexing code throughout the codebase, which is error-prone and difficult to review. ArrayLib is a simple library written with the goal of providing a simple implementation of multi-dimensional array, but preserved the C-like access syntax using C++ template metaprogramming to minimize the code change in the simulation engine. Supported Types -------------------- 1. ArrayIJ - 2D array. 2. ArrayIJK - 3D array. 3. ArrayNIJK - 3D array (N = 3 only), IJK is the coordinate of a 3D vector in space, N (0, 1, 2) selects the X/Y/Z value from the vector. To avoid ambiguity, IJK is always used to describe coordinates in space, not X/Y/Z. Usage 1: new/delete -------------------- ArrayLib can be used in several ways, the first method closely mirrors the C-style code using raw pointers, allowing direct migration from legacy code. ArrayLib::ArrayIJK<float>* arr_ptr = new ArrayLib::ArrayIJK("name", {10, 20, 30}); // arr_ptr[0][1][2] is wrong, because we need to call the operator[] // of ArrayIJK, rather than accessing an array of pointers of type // `ArrayIJK*`. (*arr_ptr)[0][1][2] = 3; // But one can use C++ reference rather than pointers to access it // in a readable way, add a line of boilerplate, but more readable ArrayLib::ArrayIJK<float>& arr = *arr_ptr; arr[0][1][2] = 3; delete arr_ptr; The first method is advantageous in the sense that all existing pointers with inconsistent lifetimes can be converted to ArrayLib directly as drop-in replacement. The disadvantage is that the C-style syntax is only emulated we're accessing a C++ reference, not a C pointer. Yet C++ reference can't be NULL, so when arrays with inconsistent lifetimes are used in classes, C-style pointers (with confusing syntax) must be kept as member variables, not references (with natural syntax). To write readable code, whenever an array is accessed by a member function, we must dereference a pointer to create a C++ reference first. Overall, this is the smoothest migration path, so this style is widely used in engine classes. Usage 2: As Local Member Variable w/ RAII ------------------------------------------ ArrayLib arrays can be used as local variables by declaring them directly (with name and size). These arrays are automatically freed by C++'s RAII mechanics when they go out of scope. void process() { ArrayLib::ArrayIJK<float> arr("name", {10, 20, 30}); array[0][1][2] = 0; // deleted automatically by RAII } This allows one to access arrays using the natural syntax, and without manual memory management, Usage 3: As Class Member Variable w/ RAII ----------------------------------------- Similarly, ArrayLib arrays can also be used as class member variables in conformance to the C++ RAII idiom. Class member variables are recommended for arrays with runtime-determined size. class Engine { // automatically deleteled by RAII ArrayLib::ArrayIJK<float> arr1("static size", {10, 20, 30}); ArrayLib::ArrayIJK<float> arr2; Engine(size_t i, size_t j, size_t k) : arr2("initialize in constructor", {i, j, k}) { // use natural C-style syntax for acess arr1[1][2][3] = 0; arr2[4][5][6] = 1; } } Ideally, this should be the recommended usage for ArrayLib arrays in C++ classes. Unfortunately, because the existing codebase does not use the C++ RAII idiom, and arrays mostly have inconsistent lifetimes (e.g. sometimes, array sizes can't be determined in the constructor, but instead is determined in the middle of the class's lifetime from an external source), it's not possible to translate all existing array usages to this idiom. Usage 4: As Class Member Variable w/ Two-Phase Initialization -------------------------------------------------------------- As a compromise to the lifetime problem, two-phase Initialization is also supported. This can be done by creating an array object (not reference or pointers) as a class member variable without specifying its name and size. The result is a placeholder array in an invalid state, which can be later initialized. class Engine { // automatically deleteled by RAII, size unknown ArrayLib::ArrayIJK<float> arr1; Engine(size_t i, size_t j, size_t k); { // two-phase initialization before use arr1.Init("name", {i, j, k}); // use natural C-style syntax for acess arr1[1][2][3] = 0; } } Although it's potentially unsafe, but one can still enjoy automatic release of memory upon class destruction by C++'s RAII. ArrayLib Arrays Can't be Passed by Value -------------------------------------------- ArrayLib objects are simple wrappers to heap-allocated memory, with few extra features in comparison to raw C pointers. In fact, they're designed to serve as drop-in replacement of raw pointers, which is prevalent in the existing codebase (we expect to refactor the codebase progressively). Thus, they're not meant to be high-level C++ containers. ArrayLib objects must be always passed as pointers or references, pass-by-value and copy-assignment are not supported. These operators are marked as deleted, attempting to do so would create a compile-time error. They make little sense for the low-level numerical simulation kernel in its current form. ArrayLib::ArrayIJK<float> data("name", size); // pass-by-reference, recommended when possible void process_ref(ArrayLib::ArrayIJK<float>& arr); process_ref(data); // pass-by-pointer, use only when necessary void process_ptr(ArrayLib::ArrayIJK<float>* arr_ptr); process_ptr(&data); // pass-by-value, WRONG! void process_value(ArrayLib::ArrayIJK<float> arr); process_value(data) Furthermore, practically speaking, in a simulation, large arrays to represent electromagnetic fields and material properties are allocated and passed around as pointers. Attempting to pass an array by value is always a bug. Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
Signed-off-by: Yifeng Li <[email protected]>
deffa35
to
078483d
Compare
I have just rebased the Pull Request myself, it should now be possible to do a regular (fast-forward) merge. |
Great work! |
TL;DR
Motivation
This is a resubmitted version of PR #117, reworked for the third time. The difference is that, this time, a small C++ class named
ArrayLib
is implemented, and the openEMS's FDTD engine is converted to use the new ArrayLib code.In ArrayLib, by using C++ template metaprogramming, the existing array indexing syntax
arr[n][i][j][k]
can be kept without changing them toarr(n, i, j, k)
. This way, we achieve two goals:Please consider merging this change as soon as it's appropriately to do so, so that the next phase of development can start. The best is yet to come.
Result
Correctness
All built-in Python examples were executed and the simulation results are exactly the same. Furthermore, field dumps of a third-party
pyems
simulation calledgcpw.py
has been executed for hours, the field dumps were compared with the unmodifed openEMS code, and all floating-point numbers are bit-identical.Note that openEMS automatically terminates simulation if energy is lower than the end criteria, but the exact timestep of termination depends on simulation speed, thus the same simulation is not reproducible by default - This is not a bug in the PR. To replicate the simulation result, it's necessary to (1) Set a impossible end criteria such as 1e-9, so a simulation is not stopped early. (2) manually specify the
NrTS
in a simulation, so that the same simulation would always stop at the same timestep.Performance
Performance is improved by 5%-30% in medium and large-size simulations. The most extreme improvement (30%) is seen in long and narrow structure. This is expected because the old array code performs the worst if the last stride is non-contiguous, which is exactly what the new code is meant to fix. For small simulation there can be a performance degradation of 4%-5% or so, but given their small sizes there slowdowns are negligible.
Since the results are rather similar to the benchmarks shown in the old PR, only the benchmark results on one machine is shown here:
ArrayLib - Simple Multi-Dimensional Array Library with C-like Syntax
Motivation
Originally, multi-dimensional arrays in the simulation engine were allocated as multi-level C pointers. For example, the following code allocates and accesses a 3D array:
The advantage is that all multi-dimensional arrays can be accessed like a C-style array. But with several downsides:
malloc()
ornew
allocation.According to the best practice today (e.g.
std::mdspan
), these raw pointers should be converted to wrapped arrays classes with their internal indexing code, and they should be accessed using the following syntax:But this requires massive change of all existing array indexing code throughout the codebase, which is error-prone and difficult to review.
ArrayLib is a simple library written with the goal of providing a simple implementation of multi-dimensional array, but preserved the C-like access syntax using C++ template metaprogramming to minimize the code change in the simulation engine.
Supported Types
To avoid ambiguity, IJK is always used to describe coordinates in space, not X/Y/Z.
Usage 1: new/delete
ArrayLib can be used in several ways, the first method closely mirrors the C-style code using raw pointers, allowing direct migration from legacy code.
The first method is advantageous in the sense that all existing pointers with inconsistent lifetimes can be converted to ArrayLib directly as drop-in replacement. The disadvantage is that the C-style syntax is only emulated we're accessing a C++ reference, not a C pointer. Yet C++ reference can't be NULL, so when arrays with inconsistent lifetimes are used in classes, C-style pointers (with confusing syntax) must be kept as member variables, not references (with natural syntax).
To write readable code, whenever an array is accessed by a member function, we must dereference a pointer to create a C++ reference first. Overall, this is the smoothest migration path, so this style is widely used in engine classes.
Usage 2: As Local Member Variable w/ RAII
ArrayLib arrays can be used as local variables by declaring them directly (with name and size). These arrays are automatically freed by C++'s RAII mechanics when they go out of scope.
This allows one to access arrays using the natural syntax, and without manual memory management.
Usage 3: As Class Member Variable w/ RAII
Similarly, ArrayLib arrays can also be used as class member variables in conformance to the C++ RAII idiom. Class member variables are recommended for arrays with runtime-determined size.
Ideally, this should be the recommended usage for ArrayLib arrays in C++ classes.
Unfortunately, because the existing codebase does not use the C++ RAII idiom, and arrays mostly have inconsistent lifetimes (e.g. sometimes, array sizes can't be determined in the constructor, but instead is determined in the middle of the class's lifetime from an external source), it's not possible to translate all existing array usages to this idiom.
Usage 4: As Class Member Variable w/ Two-Phase Initialization
As a compromise to the lifetime problem, two-phase Initialization is also supported. This can be done by creating an array object (not reference or pointers) as a class member variable without specifying its name and size. The result is a placeholder array in an invalid state, which can be later initialized.
Although it's potentially unsafe, but one can still enjoy automatic release of memory upon class destruction by C++'s RAII.
ArrayLib Arrays Can't be Passed by Value
ArrayLib objects are simple wrappers to heap-allocated memory, with few extra features in comparison to raw C pointers. In fact, they're designed to serve as drop-in replacement of raw pointers, which is prevalent in the existing codebase (we expect to refactor the codebase progressively). Thus, they're not meant to be high-level C++ containers. ArrayLib objects must be always passed as pointers or references, pass-by-value and copy-assignment are not supported. These operators are marked as deleted, attempting to do so would create a compile-time error. They make little sense for the low-level numerical simulation kernel in its current form.
Furthermore, practically speaking, in a simulation, large arrays to represent electromagnetic fields and material properties are allocated and passed around as pointers. Attempting to pass an array by value is always a bug.