Build example 3 if CUDA enabled #222

jwallwork23 · 2025-01-07T11:44:50Z

Closes #208.
Closes #227.

This PR does several things:

Build example 3 if both CUDA and MPI are enabled.
Run example 3 in run_test_suite.sh if it has been built.
Address the example 3 Python issue raised in Device error in example 3 - GPU #227.
Fix missing imports in pt2ts.py script in example 3.
Rename the scripts in the MultiGPU example to start with multigpu, for consistency with the other examples.
Fix the numbering of the ctest commands in the examples CMakeLists (they were all ones).

I've attached the build script that I used to get this working locally:
build.sh.txt

jatkinson1000

Yes, agree with this.
However, a comment that I have seen before elsewhere - it would be good to avoid "double negatives" - the guard for running integration tests is based on the unit test flag. This requires some mental gymnastics (unless it's just me) and would be a problem if we wanted to add a third set of subtests for some reason in future.

I wonder if moving to having RUN_UNIT and RUN_INTEGRATION variables that default to true and are checked against for running the tests, and setting them to false where necessary in setup would be preferable?

jwallwork23 · 2025-01-27T12:27:23Z

Yes, agree with this. However, a comment that I have seen before elsewhere - it would be good to avoid "double negatives" - the guard for running integration tests is based on the unit test flag. This requires some mental gymnastics (unless it's just me) and would be a problem if we wanted to add a third set of subtests for some reason in future.

I wonder if moving to having RUN_UNIT and RUN_INTEGRATION variables that default to true and are checked against for running the tests, and setting them to false where necessary in setup would be preferable?

Opened #259.

jwallwork23 · 2025-01-28T12:46:17Z

Ah, just saw your commit 5935bb6. Apologies, looks like I updated the README on the wrong branch (see #258).

… -> multigpu.

in example 3 If the net is saved from a particular cuda device it will be re-loaded back to that cuda device. This causes issues if we are placing the tensors on a different device, as is the case in the multigpu inference. To fix this explicitly move the loaded net to the device. We should look a little deeper about how this works on the Fortran/C++ side and verify where things are saved from/loaded to.

…mple 3.

jatkinson1000 · 2025-01-29T12:00:33Z

@jwallwork23 rebased so this can go into maion with #258 being re-targeted.

Happy to approve once you have a quick glance over to check you are happy with the rebase.
I did check it in #262 as you saw.

jwallwork23 · 2025-01-29T12:20:16Z

@jwallwork23 rebased so this can go into maion with #258 being re-targeted.

Happy to approve once you have a quick glance over to check you are happy with the rebase. I did check it in #262 as you saw.

LGTM, thanks @jatkinson1000

jatkinson1000

Reviewed, tested, amended, and rebased.
Approving now @jwallwork23 and happy for you to merge in.

jwallwork23 added the testing Related to FTorch testing label Jan 7, 2025

jwallwork23 self-assigned this Jan 7, 2025

jwallwork23 force-pushed the 208_multi-gpu-build branch 2 times, most recently from 6cc94c8 to 9d4aa86 Compare January 13, 2025 09:53

jwallwork23 marked this pull request as ready for review January 13, 2025 15:52

jwallwork23 requested a review from jatkinson1000 January 13, 2025 15:52

jatkinson1000 reviewed Jan 27, 2025

View reviewed changes

This was referenced Jan 27, 2025

Separate out MPI and multi-GPU examples #258

Closed

Avoid double negatives in run_test_suite.sh command line options #259

Closed

jwallwork23 mentioned this pull request Jan 27, 2025

Avoid double negatives in the run tests shell script to make things clearer #260

Merged

jatkinson1000 force-pushed the 208_multi-gpu-build branch from 2aa458f to 5b7f17d Compare January 28, 2025 12:22

jwallwork23 and others added 15 commits January 29, 2025 10:20

Build example 3 if CUDA and MPI enabled

a763b0e

Put model on CUDA device in simplenet

eadc273

Run example 3 if it's been built

e16c17d

Add missing imports for pt2ts

1392a0a

More helpful output for simplenet_infer_python

0129030

Fix numbering in CMakeLists for examples

2392649

Renaming in MultiGPU example; set up unit testing

6a532b5

Raise error if no CUDA in example 3

e8b5780

Lint

37d5da8

Fix model filename passed to fortran

bec0734

Do require mpi4py in Python script

ef0f938

Drop ENABLE_MPI CMake argument

e0d4e45

Bugfix: Update example 3 README with correct names of files simplenet…

8bcb9d0

… -> multigpu.

Bugfix: Include missing CheckLanguage module in the CMakeLists of exa…

ef07fca

…mple 3.

jatkinson1000 force-pushed the 208_multi-gpu-build branch from 5b7f17d to ef07fca Compare January 29, 2025 11:56

jatkinson1000 approved these changes Jan 29, 2025

View reviewed changes

jatkinson1000 mentioned this pull request Jan 29, 2025

[JOSS Review] Install + Example Docs Comments #214

Closed

jwallwork23 changed the title ~~Build example 3 if CUDA and MPI enabled~~ Build example 3 if CUDA enabled Jan 29, 2025

jwallwork23 merged commit c21040d into main Jan 29, 2025
6 checks passed

jwallwork23 deleted the 208_multi-gpu-build branch January 29, 2025 13:17

jwallwork23 mentioned this pull request Jan 30, 2025

Use get_device_index method in test #267

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build example 3 if CUDA enabled #222

Build example 3 if CUDA enabled #222

jwallwork23 commented Jan 7, 2025 •

edited

Loading

jatkinson1000 left a comment

jwallwork23 commented Jan 27, 2025 •

edited

Loading

jwallwork23 commented Jan 28, 2025

jatkinson1000 commented Jan 29, 2025

jwallwork23 commented Jan 29, 2025

jatkinson1000 left a comment

Build example 3 if CUDA enabled #222

Build example 3 if CUDA enabled #222

Conversation

jwallwork23 commented Jan 7, 2025 • edited Loading

jatkinson1000 left a comment

Choose a reason for hiding this comment

jwallwork23 commented Jan 27, 2025 • edited Loading

jwallwork23 commented Jan 28, 2025

jatkinson1000 commented Jan 29, 2025

jwallwork23 commented Jan 29, 2025

jatkinson1000 left a comment

Choose a reason for hiding this comment

jwallwork23 commented Jan 7, 2025 •

edited

Loading

jwallwork23 commented Jan 27, 2025 •

edited

Loading