-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix typo in CI rules for OpenMP on Ubuntu runners. #552
Conversation
Hu? |
I can't reproduce the test failures when I configure with |
Thanx for your tireless work on perfecting the builds. I was positively surprised that all tests were working with OpenMP but it was too good to be true I guess. The OpenMP support has been added later and some parts might not be thread safe. I looked at the failing tests and they are all related to computation of view factors. This is done using a system command (if the view factors do not exist). This system call seems to be failing almost immediately and the code reports that viewfactors are not available. I wonder what is different in the system call with and without threads. |
The OpenMP support doesn't seem to be completely borked. It seems to be working fine on Windows, macOS, and Ubuntu (when using the GCC compilers). There only seem to be issues on Ubuntu when using the LLVM Clang compilers. You might be right: It might be that the different compilers produce slightly different code. And that slight difference might expose some concurrency issues between threads. It's also strange that the issue doesn't reproduce locally. |
Could this be another case of slow disc IO on the runners? Is there any way to pause program execution in Fortran for a few hundred milliseconds? (Even if it is just for a test to see if that makes a difference.) |
IIUC, the command that is attempted to be run but fails is Output of valgrind
IIUC, that is a use after free? It runs fine when I attach gdb. So, maybe a threading issue indeed? |
Repeated with Output of valgrind
|
IIUC, the issue arises from this part of the code: Lines 742 to 752 in 3f8bf18
In line 749, I pushed an additional commit that should avoid that issue. |
Hmm... The last commit made it so that valgrind no longer complains for me locally. But the tests are still failing in CI. Maybe, for some other reason? Anyway, having clean output from valgrind is probably a step in a good direction... |
Certainly that is good! All the radiation cases return this: ERROR:: systemc: Command exit status was 139 So it is passed from the failing "ViewFactors" system command. The systemc is defined in LoadMod.F90 and depending on value of HAVE_EXECUTECOMMANDLINE it uses either intrinsic Fortran or C command. VievFactors fails immediately as there does not seem to be any std out related to it. |
I compiled with thread sanitizer (by configuring with When I run That's a lot of potential data races. I haven't checked if any of those might be bogus. |
aecdf49
to
213d6f2
Compare
Turns out Clang doesn't define But that still fails during some tests. If I understand the documentation correctly, all necessary flags should be set implicitly by using The strange thing is that these tests pass without issues when I build with OpenMP and Clang locally in Ubuntu 24.04. Anyhow, maybe it isn't worth putting any more time into setting up a configuration that builds with Clang on Ubuntu using libgomp. That is probably a very niche configuration. It might be better to just build with Clang without OpenMP on Ubuntu. (Compiler errors or warnings tend to be easier to understand with Clang compared to GCC imho.) |
That configuration seems to pass all tests. Marking as ready for review. A couple of positive side-effects coming out of this ordeal is maybe that a use-after-free in MATC is probably fixed, and that we have slightly better debugging information in the "Re-run tests" step in the CI. |
Add `D` for CMake configuration flag. As far as I can tell, that was a typo since the OpenMP configurations were added. (My bad. Sorry.)
CMake needs more help if we'd like to use libgomp when building with LLVM Clang.
`evalclause` can be called recursively for "for" statements. In this case, variables that are defined in the outer call might be updated in the inner call. Deleting the variable in the inner call invalidates the variable in the outer call. Avoid that by checking if a variable exists and only creating it if it doesn't. Also synchronize `matc/src/eval.c` to `post/matc/src/eval.c` (only keeping different licenses as they are). Also synchronize the `case forsym:` in `ElmerGUI/matc/src/eval.c`.
Great! The OpenMP & clang tests that use system call could be also run performing the "ViewFactors" command separately before. Then the system call is circumvented. But I would agree that clang and threads has not been a very common combination so far. |
There was a typo in the flags to enable OpenMP in the CI on Ubuntu. That meant that these runners never actually used OpenMP. (That's my bad. Sorry for that.)
Fixing that typo revealed that some additional flags were missing for using libgomp as the OpenMP implementation when building with LLVM Clang.
This PR fixes that typo and adds those flags.