feat: refactor core/thread logic for mpibackend #3

asoplata · 2024-12-17T15:40:41Z

This takes George's old GUI-specific _available_cores() method, moves it, and greatly expands it to include updates to the logic about cores and hardware-threading which was previously inside MPIBackend.__init__(). This was necessary due to the number of common but different outcomes based on platform, architecture, hardware-threading support, and user choice. These changes do not involve very many lines of code, but a good amount of thought and testing has gone into them. Importantly, these MPIBackend API changes are backwards-compatible, and no changes to current usage code are needed. I suggest you read the long comments in
parallel_backends.py::_determine_cores_hwthreading() outlining how each variation is handled.

Previously, if the user did not provide the number of MPI Processes they wanted to use, MPIBackend assumed that the number of detected "logical" cores would suffice. As George previously showed, this does not work for HPC environments like on OSCAR, where the only true number of cores that we are allowed to use is found by
psutil.Process().cpu_affinity(), the "affinity" core number. There is a third type of number of cores besides "logical" and "affinity" which is important: "physical". However, there was an additional problem here that was still unaddressed: hardware-threading. Different platforms and situations report different numbers of logical, affinity, and physical CPU cores. One of the factors that affects this is if there is hardware-threading present on the machine, such as Intel Hyperthreading. In the case of an example Linux laptop having an Intel chip with Hyperthreading, the logical and physical core numbers will report different values with respect to each other: logical includes Hyperthreads
(e.g. psutil.cpu_count(logical=True) reports 8 cores), but physical does not
(e.g. psutil.cpu_count(logical=False) reports 4 cores). If we tell MPI to use 8 cores ("logical"), then we ALSO need to tell it to also enable the hardware-threading option. However, if the user does not want to enable hardware-threading, then we need to make this an option, tell MPI to use 4 cores
("physical"), and tell MPI to not use the hardware-threading option. The "affinity" core number makes things even more complicated, since in the Linux laptop example, it is equal to the logical core number. However, on OSCAR, it is very different than the logical core number, and on Macos, it is not present at all.

In _determine_cores_hwthreading(), if you read the lengthy comments, I have thought through each common scenario, and I believe resolved what to do for each, with respect to the number of cores to use and whether or not to use hardware-threading. These scenarios include: the user choosing to use hardware-threading (default) or not, across Macos variations with and without hardware-threading, Linux local computer variations with and without hardware-threading, and Linux HPC (e.g. OSCAR) variations which appear to never support hardware-threading. In the Windows case, due to both jonescompneurolab#589 and the currently-untested MPI integration on Windows, I always report the machine as not having hardware-threading.

Additionally, previously, if the user did provide a number for MPI Processes, MPIBackend used some "heuristics" to decide whether to use MPI oversubscription and/or hardware-threading, but the user could not override these heuristics. Now, when a user instantiates an MPIBackend with __init__() and uses the defaults, hardware-threading is detected more robustly and enabled by default, and oversubscription is enabled based on its own heuristics; this is the case when the new arguments hwthreading and oversubscribe are set to their default value of None. However, if the user knows what they're doing, they can also pass either True or False to either of these options to force them on or off. Furthermore, in the case of hwthreading, if the user indicates they do not want to use it, then
_determine_cores_hwthreading() correctly returns the number of NON-hardware-threaded cores for MPI's use, instead of the core number including hardware-threads.

I have also modified and expanded the appropriate testing to compensate for these changes.

Note that this does NOT change the default number of jobs to use for the GUI if MPI is detected. Such a change breaks the current test_gui.py testing: see jonescompneurolab#960
jonescompneurolab#960

This takes George's old GUI-specific `_available_cores()` method, moves it, and greatly expands it to include updates to the logic about cores and hardware-threading which was previously inside `MPIBackend.__init__()`. This was necessary due to the number of common but different outcomes based on platform, architecture, hardware-threading support, and user choice. These changes do not involve very many lines of code, but a good amount of thought and testing has gone into them. Importantly, these `MPIBackend` API changes are backwards-compatible, and no changes to current usage code are needed. I suggest you read the long comments in `parallel_backends.py::_determine_cores_hwthreading()` outlining how each variation is handled. Previously, if the user did not provide the number of MPI Processes they wanted to use, `MPIBackend` assumed that the number of detected "logical" cores would suffice. As George previously showed, this does not work for HPC environments like on OSCAR, where the only true number of cores that we are allowed to use is found by `psutil.Process().cpu_affinity()`, the "affinity" core number. There is a third type of number of cores besides "logical" and "affinity" which is important: "physical". However, there was an additional problem here that was still unaddressed: hardware-threading. Different platforms and situations report different numbers of logical, affinity, and physical CPU cores. One of the factors that affects this is if there is hardware-threading present on the machine, such as Intel Hyperthreading. In the case of an example Linux laptop having an Intel chip with Hyperthreading, the logical and physical core numbers will report different values with respect to each other: logical includes Hyperthreads (e.g. `psutil.cpu_count(logical=True)` reports 8 cores), but physical does not (e.g. `psutil.cpu_count(logical=False)` reports 4 cores). If we tell MPI to use 8 cores ("logical"), then we ALSO need to tell it to also enable the hardware-threading option. However, if the user does not want to enable hardware-threading, then we need to make this an option, tell MPI to use 4 cores ("physical"), and tell MPI to not use the hardware-threading option. The "affinity" core number makes things even more complicated, since in the Linux laptop example, it is equal to the logical core number. However, on OSCAR, it is very different than the logical core number, and on Macos, it is not present at all. In `_determine_cores_hwthreading()`, if you read the lengthy comments, I have thought through each common scenario, and I believe resolved what to do for each, with respect to the number of cores to use and whether or not to use hardware-threading. These scenarios include: the user choosing to use hardware-threading (default) or not, across Macos variations with and without hardware-threading, Linux local computer variations with and without hardware-threading, and Linux HPC (e.g. OSCAR) variations which appear to never support hardware-threading. In the Windows case, due to both jonescompneurolab#589 and the currently-untested MPI integration on Windows, I always report the machine as not having hardware-threading. Additionally, previously, if the user did provide a number for MPI Processes, `MPIBackend` used some "heuristics" to decide whether to use MPI oversubscription and/or hardware-threading, but the user could not override these heuristics. Now, when a user instantiates an `MPIBackend` with `__init__()` and uses the defaults, hardware-threading is detected more robustly and enabled by default, and oversubscription is enabled based on its own heuristics; this is the case when the new arguments `hwthreading` and `oversubscribe` are set to their default value of `None`. However, if the user knows what they're doing, they can also pass either `True` or `False` to either of these options to force them on or off. Furthermore, in the case of `hwthreading`, if the user indicates they do not want to use it, then `_determine_cores_hwthreading()` correctly returns the number of NON-hardware-threaded cores for MPI's use, instead of the core number including hardware-threads. I have also modified and expanded the appropriate testing to compensate for these changes. Note that this does NOT change the default number of jobs to use for the GUI if MPI is detected. Such a change breaks the current `test_gui.py` testing: see jonescompneurolab#960 jonescompneurolab#960

ntolley · 2024-12-19T00:24:45Z

hnn_core/gui/gui.py

@@ -347,7 +347,10 @@ def __init__(self, theme_color="#802989",
        self.params = self.load_parameters(network_configuration)

        # Number of available cores
-        self.n_cores = self._available_cores()
+        [self.n_cores, _] = _determine_cores_hwthreading(
+            enable_hwthreading=False,


Just curious, what are the pros/cons of having this false by default?

Note that this call is specific to the GUI only. I figured that, just to be extra safe, anytime we are using the GUI in a new-user context, we should only use the runnable physical cores, and turn off hardware-threading in very unlikely case that hardware-threading causes issues. One possible example could be if someone is locally running the GUI using an AMD processor with SMT enabled, which is hardware I don't have access to and can't test.

To be honest, though, it probably doesn't matter, and we could probably leave this on by default and be fine.

ntolley · 2024-12-19T00:28:30Z

hnn_core/parallel_backends.py


        return dpls


+def _determine_cores_hwthreading(
+        enable_hwthreading: Union[None, bool] = True,


I'm guessing you're moving towards typing throughout the code

Are you thinking of doing it through small changes like this and just progressively update the code base?

Yes, exactly; I've made a general issue for this here: jonescompneurolab#965 . Note that the type-checker mypy specifically recommends progressive addition of typing, rather than trying to do it all at once.

ntolley · 2024-12-19T00:30:40Z

hnn_core/parallel_backends.py

+        import platform
+        import psutil
+        if platform.system() == "Darwin":
+            # In Macos' case here, the situation is relatively simple, and we


It honestly seems like the ability to reliably detect the number of cores could be a standalone python package

Yeah, I do wish there was a way in the standard library to detect this stuff (cores vs hardware-threading etc.) in a way that wasn't OS-specific. As true parallelism becomes more important in programming, this is going to become more important. Fortunately, as described in my comments, even though OS support of the detection functions is heterogeneous, today we have all the tools we need to infer our core number and threading presence on all platforms that we primarily use.

ntolley · 2025-01-13T21:50:03Z

hnn_core/parallel_backends.py

+            # This lets us pass the same arg to this function and MPIBackend()
+            # in case we want to use the default approaches.
+            enable_hwthreading = True
+        import platform


any reason to prefer a nested import here?

Unfortunately, yes. The old version of MPIBackend.__init__, which this code now sits inside of, also used local imports. This is still necessary just like it was before because this module is imported for every simulation (hence why literally every current simulation outputs a Joblib will run...), but the module psutil is NOT necessarily installed.

Eventually, it would be nice if Joblib wasn't called for singleton simulations and was only used if any parallelization was attempted by the user (or, we always default to parallel, but make that more explicit). Additionally, this file could also be split into two, one each with Joblib's part vs MPI's part. It would make what is used by each more clear.

ntolley · 2025-01-13T21:58:29Z

@asoplata this looks good to merge into the branch! I have a few more comments but I'll hold off so we can discuss on the HNN-core PR

asoplata · 2025-01-14T17:19:14Z

Hey @gtdang do you mine merging this PR into your branch, so that we can continue the discussion back into jonescompneurolab#871 ? Thanks

ntolley reviewed Dec 19, 2024

View reviewed changes

ntolley reviewed Jan 13, 2025

View reviewed changes

ntolley approved these changes Jan 13, 2025

View reviewed changes

gtdang merged commit ee10d38 into brown-ccv:gui-mpi-available-cores Jan 16, 2025
10 checks passed

asoplata deleted the gui-mpi-available-cores-2 branch January 29, 2025 17:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: refactor core/thread logic for mpibackend #3

feat: refactor core/thread logic for mpibackend #3

asoplata commented Dec 17, 2024

ntolley Dec 19, 2024

asoplata Jan 10, 2025

ntolley Dec 19, 2024

asoplata Jan 10, 2025

ntolley Dec 19, 2024

asoplata Jan 10, 2025

ntolley Jan 13, 2025

asoplata Jan 14, 2025

asoplata Jan 14, 2025

ntolley commented Jan 13, 2025

asoplata commented Jan 14, 2025

feat: refactor core/thread logic for mpibackend #3

feat: refactor core/thread logic for mpibackend #3

Conversation

asoplata commented Dec 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntolley commented Jan 13, 2025

asoplata commented Jan 14, 2025