-
Notifications
You must be signed in to change notification settings - Fork 889
Meeting 2025 03 14
This prefix thing kinda sucks (https://github.com/openpmix/prrte/pull/2154); it's getting complicated and Jeff fears it will be difficult to maintain over time. Is there a long-term path to get us out of this business?
Idea:
-
This particular problem comes down to installdirs.
- We have installdirs for those who relocate installations (e.g., NVIDIA's Open MPI packaging).
- At run time, we need to find plugins and show_help files.
- Sidenote: Do we need to find anything else?
- Can we solve this? I.e., can we find what we need at runtime via some other mechanism?
- [JS] Can we look at
proc/self/maps
to find the directory of a known library (e.g.,libmpi.so
)? Not sure how robust that is.
-
We still have problems of multiple levels in the stack (OMPI, PRTE, PMIX) re-using MCA things.
- We've sorta solved that by replicating everything and using different prefixes in env variable names and the like.
- But it still kinda sucks -- lots of corner cases come up. And code duplication.
- We're not going to solve that problem today.
Let's look at the installdirs issue.
-
Ralph's PR already merged to PRTE: we now have 4 prefixes (CLI params and env vars)
- There's a PR to merge this into OMPI v5.0.x, too (https://github.com/open-mpi/ompi/pull/13141)
- Let's let all this go in
-
George will investigate: in OMPI's installdirs init:
- If user sets env variable(s), use that(them)
- If user didn't set env variable(s):
-
Make LD call to find filesystem path of library containing opal_init (or whatever symbol makes sense)
-
Take dirname of that
-
Compare to installdirs libdir
-
If it's the same -- ok, we're done
-
If it's not the same:
- Look at old libdir: is it defined in terms of prefix? If so, see if a comparison the path we just found compared to the old libdir can distill a prefix from that.
- E.g., if we found
/bar/lib/libopal_pal.so
, and original installdirs libdir (from configure) was{prefix}/lib
, then the new value for the installdirs prefix can be/bar
. - Otherwise, assume prefix is one dir up from that
- Set installdirs prefix to that value
-
This is good enough for OMPI v5.0.x / NVIDIA
-
Make sure to document this process in the RST docs somewhere
-
Can we get this to work with a small-ish patch? Assume yes. George will prototype.
-
Include everything that George did for v5.0.x
- Perhaps get fancier trying to distill prefix from libdir (TBD)
- Document in the RST whatever fanciness we do
-
installdirs currently has a bunch of dirs that nothing in the C code uses
- Let's remove all the dirs that we are not using -- only keep the ones that we actually need.
- Perhaps we only need libdir and help files dir...? (TBD)
- If we remove things, we need to update documentation to remove all corresponding env variables / MCA params.
- Let's remove all the dirs that we are not using -- only keep the ones that we actually need.
-
After removing what dirs aren't necessary, we should stat() all dirs in installdirs and complain if something doesn't exist
-
Can we slurp the text help files into C code somehow?
- This would be one less thing we have to find at run time
- ...and potentially one more entry we can remove from installdirs
- Maybe run some (python?) script during
make
that converts the text files into C code that is then compiled.- Random note: clang v16 doesn't like multi-line C strings. Will need to be a little clever about how to encode the strings.
- Will also need to upate
opal_show_help()
to get text source from C variables instead of reading text files. - JS: C23 has
#embed
to include arbitrary files into the binary but that would require GCC15/Clang19
- This would be one less thing we have to find at run time
-
Open question: if the new prefix-setting mechanism works reliably, can we sunset the prefix-setting CLI/env var mechanisms?
-
Here's the dirs we need:
- bindir (when launching on a remote node, especially via SSH, or launching into dissimilar environments such as containers)
- libdir (when launching on a remote node, especially via SSH, or launching into dissimilar environments such as containers)
- DSO dir (to find DSOs)
- sysconf dir (to find config files)
- text help file dir (to find the show_help text files)