Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda12_2-2.8.0 container unable to detect gpu #216

Open
1 of 6 tasks
levski opened this issue Dec 8, 2024 · 7 comments
Open
1 of 6 tasks

cuda12_2-2.8.0 container unable to detect gpu #216

levski opened this issue Dec 8, 2024 · 7 comments

Comments

@levski
Copy link

levski commented Dec 8, 2024

Area of Concern

  • Server
  • Behaviour of one or more Modules [provide name(s), e.g. ObjectDetectionYolo]
  • Installer
  • Runtime [e.g. Python3.7, .NET]
  • Module packages [e.g. PyTorch)
  • Something else

Describe the bug
i'm using the codeproject/ai-server:cuda12_2-2.8.0
cuda is only used on the yolo5 model and not by the facial recognition model as well.
i just started with the project, so am not quite sure if i misconfigured anything so far.

Expected behavior
cpai server uses cuda for all modules in the container, not just object detection

Screenshots
If applicable, add screenshots to help explain your problem.

Your System (please complete the following information):

  • CodeProject.AI Server version: container cuda12_2-2.8.0
  • OS: debian12
  • System RAM 32Gb
  • GPU (if available) GTX 1080Ti
  • GPU RAM (if available) 11Gb

Additional context
docker logs, docker compose, and nvidia smi

opt/codeproject$ docker logs CodeProjectAI

Creating downloaded models path '/app/downloads/models'
Infor ** System:           Docker (7aca8b997a5a)
Infor ** Operating System: Linux (Ubuntu 22.04)
Infor ** CPUs:             AMD Ryzen 5 2600X Six-Core Processor (AMD)
Infor **                   1 CPU x 6 cores. 12 logical processors (x64)
Infor ** System RAM:       31 GiB
Infor ** Platform:         Linux
Infor ** BuildConfig:      Release
Infor ** Execution Env:    Docker
Infor ** Runtime Env:      Production
Infor ** Runtimes installed:
Infor **   .NET runtime:     8.0.8
Infor **   .NET SDK:         Not found
Infor **   Default Python:   3.10.12
Infor **   Go:               Not found
Infor **   NodeJS:           Not found
Infor **   Rust:             Not found
Infor ** App DataDir:      /etc/codeproject/ai
Infor Video adapter info:
Infor *** STARTING CODEPROJECT.AI SERVER
Infor RUNTIMES_PATH             = /app/runtimes
Infor PREINSTALLED_MODULES_PATH = /app/preinstalled-modules
Infor DEMO_MODULES_PATH         = /app/src/demos/modules
Infor EXTERNAL_MODULES_PATH     = /CodeProject.AI-Modules
Infor MODULES_PATH              = /app/modules
Infor PYTHON_PATH               = /bin/linux/%PYTHON_NAME%/venv/bin/python3
Infor Data Dir                  = /etc/codeproject/ai
Infor ** Server version:   2.8
Server is listening on port 32168
Server is also listening on legacy port 5000
Trace ModuleRunner Start
Infor Rerunning modules setup because the Docker container ID has changed.
Infor Installing module 'All Modules'
Debug Installer script at '/app/setup.sh'
Infor Setting verbosity to quiet
Infor Hi Docker! We will disable shared python installs for downloaded modules
Warni Overriding address(es) 'http://+:32168, http://+:5000'. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.
Infor (No schemas means: we can't detect if you're in light or dark mode)
Infor           Setting up CodeProject.AI Development Environment           
Infor ======================================================================
Infor                    CodeProject.AI Installer                           
Infor ======================================================================
Infor 861.09 GiB of 959.06 GiB available on Docker (linux ubuntu x86_64 - linux)
Infor Installing xz-utils...
Error E: Could not get lock /var/lib/apt/lists/lock. It is held by process 106 (apt-get)
Error E: Unable to lock directory /var/lib/apt/lists/
Infor Reading package lists...
Infor Building dependency tree...
Infor Reading state information...
Infor 0 upgraded, 0 newly installed, 0 to remove and 39 not upgraded.
Infor General CodeProject.AI setup                                          
Infor Setting permissions on runtimes folder...done
Infor Setting permissions on downloads folder...done
Infor Setting permissions on modules download folder...done
Infor Creating models download folder...done
Infor Setting permissions on models download folder...done
Infor Setting permissions on persisted data folder...done
Infor GPU support                                                           
Infor CUDA (NVIDIA) Present: No
Infor ROCm (AMD) Present:    (attempt to install rocminfo... ) No
Infor MPS (Apple) Present:   No
Infor Processing Included CodeProject.AI Server Modules                     
Infor Reading module settings.......done
Infor Processing module                                                     
Infor This module cannot be installed on this system
Infor Processing External CodeProject.AI Server Modules                     
Infor No external modules found
Infor Module setup Complete
Infor                 Setup complete                                        
Infor Total setup time 00:00:05
Infor Installer exited with code 0
Trace Starting Background AI Modules
Trace Running module using: python3.8
Debug 
Debug Attempting to start FaceProcessing with python3.8 "/app/preinstalled-modules/FaceProcessing/intelligencelayer/face.py"
Trace Starting python3.8 "/app.../FaceProcessing/intelligencelayer/face.py"
Infor 
Infor ** Module 'Face Processing' 1.12.0 (ID: FaceProcessing)
Infor ** Valid:            True
Infor ** Module Path:      <root>/preinstalled-modules/FaceProcessing
Infor ** Module Location:  PreInstalled
Infor ** AutoStart:        True
Infor ** Queue:            faceprocessing_queue
Infor ** Runtime:          python3.8
Infor ** Runtime Location: System
Infor ** FilePath:         intelligencelayer/face.py
Infor ** Start pause:      3 sec
Infor ** Parallelism:      0
Infor ** LogVerbosity:
Infor ** Platforms:        all,!jetson
Infor ** GPU Libraries:    installed if available
Infor ** GPU:              use if supported
Infor ** Accelerator:
Infor ** Half Precision:   enable
Infor ** Environment Variables
Infor ** APPDIR             = <root>/preinstalled-modules/FaceProcessing/intelligencelayer
Infor ** DATA_DIR           = /etc/codeproject/ai
Infor ** MODE               = MEDIUM
Infor ** MODELS_DIR         = <root>/preinstalled-modules/FaceProcessing/assets
Infor ** PROFILE            = desktop_gpu
Infor ** USE_CUDA           = True
Infor ** YOLOv5_AUTOINSTALL = false
Infor ** YOLOv5_VERBOSE     = false
Infor 
Infor Started Face Processing module
Trace Running module using: python3.8
Debug 
Debug Attempting to start ObjectDetectionYOLOv5-6.2 with python3.8 "/app/preinstalled-modules/ObjectDetectionYOLOv5-6.2/detect_adapter.py"
Trace Starting python3.8 "/app...jectDetectionYOLOv5-6.2/detect_adapter.py"
Infor 
Infor ** Module 'Object Detection (YOLOv5 6.2)' 1.10.0 (ID: ObjectDetectionYOLOv5-6.2)
Infor ** Valid:            True
Infor ** Module Path:      <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2
Infor ** Module Location:  PreInstalled
Infor ** AutoStart:        True
Infor ** Queue:            objectdetection_queue
Infor ** Runtime:          python3.8
Infor ** Runtime Location: System
Infor ** FilePath:         detect_adapter.py
Infor ** Start pause:      1 sec
Infor ** Parallelism:      0
Infor ** LogVerbosity:
Infor ** Platforms:        all,!raspberrypi,!jetson
Infor ** GPU Libraries:    installed if available
Infor ** GPU:              use if supported
Infor ** Accelerator:
Infor ** Half Precision:   enable
Infor ** Environment Variables
Infor ** APPDIR             = <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2
Infor ** CUSTOM_MODELS_DIR  = <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2/custom-models
Infor ** MODELS_DIR         = <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2/assets
Infor ** MODEL_SIZE         = Medium
Infor ** USE_CUDA           = True
Infor ** YOLOv5_AUTOINSTALL = false
Infor ** YOLOv5_VERBOSE     = false
Infor 
Infor Started Object Detection (YOLOv5 6.2) module
Trace face.py: Vision AI services setup: Retrieving environment variables...
Debug face.py: APPDIR:       /app/preinstalled-modules/FaceProcessing/intelligencelayer
Debug face.py: PROFILE:      desktop_cpu
Debug face.py: USE_CUDA:     False
Debug face.py: DATA_DIR:     /etc/codeproject/ai
Debug face.py: MODELS_DIR:   /app/preinstalled-modules/FaceProcessing/assets
Debug face.py: MODE:         MEDIUM
Trace face.py: Running init for Face Processing
Debug detect_adapter.py: APPDIR:      /app/preinstalled-modules/ObjectDetectionYOLOv5-6.2
Debug detect_adapter.py: MODEL_SIZE:  medium
Debug detect_adapter.py: MODELS_DIR:  /app/preinstalled-modules/ObjectDetectionYOLOv5-6.2/assets
Trace detect_adapter.py: Running init for Object Detection (YOLOv5 6.2)
Trace ModuleRunner Stop
Infor Sending shutdown request to python3.8/FaceProcessing
Trace Client request 'Quit' in queue 'faceprocessing_queue' (#reqid eb3d0e74-764d-4ae2-a840-a9e8b458ea52)
Debug ObjectDetectionYOLOv5Net doesn't appear in the Process list, so can't stop it.
Infor Sending shutdown request to python3.8/ObjectDetectionYOLOv5-6.2
Trace Client request 'Quit' in queue 'objectdetection_queue' (#reqid d3aec7d9-c4fb-4d5e-8a2c-e2f3699d60f5)
Infor ** System:           Docker (7aca8b997a5a)
Infor ** Operating System: Linux (Ubuntu 22.04)
Infor ** CPUs:             AMD Ryzen 5 2600X Six-Core Processor (AMD)
Infor **                   1 CPU x 6 cores. 12 logical processors (x64)
Infor ** System RAM:       31 GiB
Infor ** Platform:         Linux
Infor ** BuildConfig:      Release
Infor ** Execution Env:    Docker
Infor ** Runtime Env:      Production
Infor ** Runtimes installed:
Infor **   .NET runtime:     8.0.8
Infor **   .NET SDK:         Not found
Infor **   Default Python:   3.10.12
Infor **   Go:               Not found
Infor **   NodeJS:           Not found
Infor **   Rust:             Not found
Infor ** App DataDir:      /etc/codeproject/ai
Infor Video adapter info:
Infor *** STARTING CODEPROJECT.AI SERVER
Infor RUNTIMES_PATH             = /app/runtimes
Infor PREINSTALLED_MODULES_PATH = /app/preinstalled-modules
Infor DEMO_MODULES_PATH         = /app/src/demos/modules
Infor EXTERNAL_MODULES_PATH     = /CodeProject.AI-Modules
Infor MODULES_PATH              = /app/modules
Infor PYTHON_PATH               = /bin/linux/%PYTHON_NAME%/venv/bin/python3
Infor Data Dir                  = /etc/codeproject/ai
Infor ** Server version:   2.8
Server is listening on port 32168
Server is also listening on legacy port 5000
Trace ModuleRunner Start
Trace Starting Background AI Modules
Warni Overriding address(es) 'http://+:32168, http://+:5000'. Binding to endpoints defined via IConfiguration and/or UseKestrel() instead.
Trace Running module using: python3.8
Debug 
Debug Attempting to start FaceProcessing with python3.8 "/app/preinstalled-modules/FaceProcessing/intelligencelayer/face.py"
Trace Starting python3.8 "/app.../FaceProcessing/intelligencelayer/face.py"
Infor 
Infor ** Module 'Face Processing' 1.12.0 (ID: FaceProcessing)
Infor ** Valid:            True
Infor ** Module Path:      <root>/preinstalled-modules/FaceProcessing
Infor ** Module Location:  PreInstalled
Infor ** AutoStart:        True
Infor ** Queue:            faceprocessing_queue
Infor ** Runtime:          python3.8
Infor ** Runtime Location: System
Infor ** FilePath:         intelligencelayer/face.py
Infor ** Start pause:      3 sec
Infor ** Parallelism:      0
Infor ** LogVerbosity:
Infor ** Platforms:        all,!jetson
Infor ** GPU Libraries:    installed if available
Infor ** GPU:              use if supported
Infor ** Accelerator:
Infor ** Half Precision:   enable
Infor ** Environment Variables
Infor ** APPDIR             = <root>/preinstalled-modules/FaceProcessing/intelligencelayer
Infor ** DATA_DIR           = /etc/codeproject/ai
Infor ** MODE               = MEDIUM
Infor ** MODELS_DIR         = <root>/preinstalled-modules/FaceProcessing/assets
Infor ** PROFILE            = desktop_gpu
Infor ** USE_CUDA           = True
Infor ** YOLOv5_AUTOINSTALL = false
Infor ** YOLOv5_VERBOSE     = false
Infor 
Infor Started Face Processing module
Trace Running module using: python3.8
Debug 
Debug Attempting to start ObjectDetectionYOLOv5-6.2 with python3.8 "/app/preinstalled-modules/ObjectDetectionYOLOv5-6.2/detect_adapter.py"
Trace Starting python3.8 "/app...jectDetectionYOLOv5-6.2/detect_adapter.py"
Infor 
Infor ** Module 'Object Detection (YOLOv5 6.2)' 1.10.0 (ID: ObjectDetectionYOLOv5-6.2)
Infor ** Valid:            True
Infor ** Module Path:      <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2
Infor ** Module Location:  PreInstalled
Infor ** AutoStart:        True
Infor ** Queue:            objectdetection_queue
Infor ** Runtime:          python3.8
Infor ** Runtime Location: System
Infor ** FilePath:         detect_adapter.py
Infor ** Start pause:      1 sec
Infor ** Parallelism:      0
Infor ** LogVerbosity:
Infor ** Platforms:        all,!raspberrypi,!jetson
Infor ** GPU Libraries:    installed if available
Infor ** GPU:              use if supported
Infor ** Accelerator:
Infor ** Half Precision:   enable
Infor ** Environment Variables
Infor ** APPDIR             = <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2
Infor ** CUSTOM_MODELS_DIR  = <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2/custom-models
Infor ** MODELS_DIR         = <root>/preinstalled-modules/ObjectDetectionYOLOv5-6.2/assets
Infor ** MODEL_SIZE         = Medium
Infor ** USE_CUDA           = True
Infor ** YOLOv5_AUTOINSTALL = false
Infor ** YOLOv5_VERBOSE     = false
Infor 
Infor Started Object Detection (YOLOv5 6.2) module
Trace face.py: Vision AI services setup: Retrieving environment variables...
Debug face.py: APPDIR:       /app/preinstalled-modules/FaceProcessing/intelligencelayer
Debug face.py: PROFILE:      desktop_cpu
Debug face.py: USE_CUDA:     False
Debug face.py: DATA_DIR:     /etc/codeproject/ai
Debug face.py: MODELS_DIR:   /app/preinstalled-modules/FaceProcessing/assets
Debug face.py: MODE:         MEDIUM
Trace face.py: Running init for Face Processing
Debug detect_adapter.py: APPDIR:      /app/preinstalled-modules/ObjectDetectionYOLOv5-6.2
Debug detect_adapter.py: MODEL_SIZE:  medium
Debug detect_adapter.py: MODELS_DIR:  /app/preinstalled-modules/ObjectDetectionYOLOv5-6.2/assets
Trace detect_adapter.py: Running init for Object Detection (YOLOv5 6.2)
/opt/codeproject$ cat docker-compose.yml 
version: "3.8"

services:
  codeprojectai:
    image: codeproject/ai-server:cuda12_2-2.8.0
    container_name: CodeProjectAI
    restart: unless-stopped
    ports:
      - "32168:32168"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
    volumes:
      - /etc/codeproject/ai:/etc/codeproject/ai
      - /opt/codeproject/ai:/app/modules
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=all
/opt/codeproject$ nvidia-smi 
Mon Dec  9 00:59:54 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 560.35.05      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  |   00000000:0E:00.0 Off |                  N/A |
|  0%   60C    P2             59W /  280W |    1433MiB /  11264MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A   4050334      C   ffmpeg                                        165MiB |
|    0   N/A  N/A   4050339      C   ffmpeg                                        165MiB |
|    0   N/A  N/A   4050343      C   ffmpeg                                        165MiB |
|    0   N/A  N/A   4050356      C   ffmpeg                                        165MiB |
|    0   N/A  N/A   4050364      C   ffmpeg                                        384MiB |
|    0   N/A  N/A   4050373      C   ffmpeg                                        384MiB |
+-----------------------------------------------------------------------------------------+
junjun@i-c-u:/opt/codeproject$ 
@ChrisMaunder
Copy link
Contributor

Infor GPU support
Infor CUDA (NVIDIA) Present: No

This says that the system can't recognise your CUDA install. Are you running CUDA 12.2+ locally, and did you start the container using --gpus all

@levski
Copy link
Author

levski commented Dec 9, 2024

i used docker compose:
/opt/codeproject$ cat docker-compose.yml

version: "3.8"

services:
codeprojectai:
image: codeproject/ai-server:cuda12_2-2.8.0
container_name: CodeProjectAI
restart: unless-stopped
ports:
- "32168:32168"
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
volumes:
- /etc/codeproject/ai:/etc/codeproject/ai
- /opt/codeproject/ai:/app/modules
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=all

should i have used docker with --gpus all instead?

i got this installed on the host and frigate sees the same which uses the same flags on its docker compose:
| NVIDIA-SMI 560.35.05 Driver Version: 560.35.05 CUDA Version: 12.6

@ChrisMaunder
Copy link
Contributor

I just realised you are running an older docker container. Try using the latest

version: '3.9'

services:
  CodeProjectAI:
    image: codeproject/ai-server:cuda12_2
    container_name: "codeproject-ai-server-cuda"
    restart: unless-stopped
    ports:
      - "32168:32168/tcp"
      - "32168:32168/udp"
    environment:
       ...

If that doesn't work it might be worth trying:

docker run --name CodeProject.AI -d -p 32168:32168 --gpus all ^
 --mount type=bind,source=C:\ProgramData\CodeProject\AI\docker\data,target=/etc/codeproject/ai ^
 --mount type=bind,source=C:\ProgramData\CodeProject\AI\docker\modules,target=/app/modules ^
   codeproject/ai-server:cuda12_2

@levski
Copy link
Author

levski commented Dec 12, 2024 via email

@randellhodges
Copy link

Did you install the nvidia container toolkit on the host? I just went thru this exercise with Debian 12. Have you tried one of the test containers that run nvidia-smi in a container, just to see that docker is passing it thru correctly?

@levski
Copy link
Author

levski commented Dec 15, 2024 via email

@randellhodges
Copy link

I am using codeproject/ai-server:cuda12_2-2.9.5 if that might make a difference.

My setup. I have a Debian 12 VM running in proxmox. I use an Nvidia P4 and pass a VGPU to the Debian 12 VM. In that VM, I installed the vgpu drivers, the license, nvidia container toolkit, docker and that's about it.

My docker compose looks almost identical except the that I'm using 2.9.5. I am able to docker exec -it container bash and then poke around and run nvidia-smi inside the container and see my card and everything is working.

It sounds like you have other docker containers on the same host using the card. So it sounds like the host is fine. I've never tried docker with a consumer card and docker. Is there any issues sharing a consumer card to multiple containers at the same time?

I'd give 2.9.5 a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants