Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed_podman_build #150

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

fixed_podman_build #150

wants to merge 1 commit into from

Conversation

Greyyy-HJC
Copy link

Previously when I tried the original podman build, I failed because the GPU driver cannot be recognized inside the container.

In this change, I added another folder, including a new Dockerfile and instructions to build with container.

Thank you for the good project, hope you can check the change and accept it.

Cheers,
Jinchen

@lukas-mazur
Copy link
Collaborator

Hi @Greyyy-HJC , thank you for the contribution! @clarkedavida you use the container frequently, right? Can you double check whether these changes work for you?

@clarkedavida
Copy link
Collaborator

I'm sorry it took me so long to look at this. I only noticed yesterday that this was forwarded to me.

I have followed your instructions so far, and ran into this error:

docker run --name simqcd_container --hooks-dir=/usr/share/containers/oci/hooks.d/ --runtime=nvidia -it greyyyhjc/simqcd_cuda_11.2
unknown flag: --hooks-dir
See 'docker run --help'.

Does the command need to be updated?

@Greyyy-HJC
Copy link
Author

Greyyy-HJC commented Jun 6, 2024 via email

@clarkedavida
Copy link
Collaborator

clarkedavida commented Jun 6, 2024

Thanks for your hints Jinchen, I am making good progress now. What is the difference between the From NVIDIA and ready2use builds? Is there a reason we need both?

Also, after following the ready2use instructions, I compiled memManTest and hit the following error while running:

# [2024-06-06 12:48:31] FATAL: A GPU error occured: _rawPointer: Failed to allocate (additional) 1.024e-06 GB of memory on host: no CUDA-capable device is detected ( cudaErrorNoDevice )
terminate called after throwing an instance of 'std::runtime_error'
  what():  A GPU error occured: _rawPointer: Failed to allocate (additional) 1.024e-06 GB of memory on host: no CUDA-capable device is detected ( cudaErrorNoDevice )

I do have an NVIDIA quadro p500 on this system. Before compiling, I cleared out the build folder and configured with architecture 61, which should be correct for this GPU. I should also mention if I compile SIMULATeQCD manually on this system everything works. Any ideas?

@Greyyy-HJC
Copy link
Author

Greyyy-HJC commented Jun 6, 2024 via email

@clarkedavida
Copy link
Collaborator

OK, any hints about the error I hit?

@Greyyy-HJC
Copy link
Author

OK, any hints about the error I hit?

Oh, I missed your error before, sorry. I just checked on my architecture 86 machine, I can make memManTest successfully, could you try another machine with different architecture? I am not sure about memManTest, does it have some requirement on hardware architecture?

Below is the output that I got.

Best,
Jinchen

root@7e4148d81e9b:/buildsimqcd# make memManTest
Scanning dependencies of target memManTest
Building CUDA object CMakeFiles/memManTest.dir/src/testing/main_memManTest.cpp.o
Building CXX object CMakeFiles/memManTest.dir/src/base/gutils.cpp.o
Building CXX object CMakeFiles/memManTest.dir/src/base/memoryManagement.cpp.o
Building CUDA object CMakeFiles/memManTest.dir/src/base/indexer/initGPUIndexer.cpp.o
Building CXX object CMakeFiles/memManTest.dir/src/base/indexer/initCPUIndexer.cpp.o
Building CXX object CMakeFiles/memManTest.dir/src/base/communication/communicationBase_mpi.cpp.o
Building CXX object CMakeFiles/memManTest.dir/src/base/IO/parameterManagement.cpp.o
Building CXX object CMakeFiles/memManTest.dir/src/base/IO/fileWriter.cpp.o
Building CUDA object CMakeFiles/memManTest.dir/src/base/math/random.cpp.o
Building CUDA object CMakeFiles/memManTest.dir/src/gauge/gaugefield_device.cpp.o
Building CUDA object CMakeFiles/memManTest.dir/src/gauge/gaugefield.cpp.o
Building CUDA object CMakeFiles/memManTest.dir/src/gauge/gaugeAction.cpp.o
Building CUDA object CMakeFiles/memManTest.dir/src/base/latticeContainer.cpp.o
Linking CUDA device code CMakeFiles/memManTest.dir/cmake_device_link.o
Linking CXX executable testing/memManTest
Built target memManTest

@Greyyy-HJC
Copy link
Author

Greyyy-HJC commented Jun 7, 2024 via email

@clarkedavida
Copy link
Collaborator

Will I need sudo privileges to use the container? Otherwise I only have my laptop that has a usable GPU.

@Greyyy-HJC
Copy link
Author

Greyyy-HJC commented Jun 10, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants