We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi,
As discussed with Fahad on discord - the kairon worker is using CPU and not GPU inside the docker container.
I ran some tests to make sure it wasn't a problem on my side;
version: "3" services: test: image: tensorflow/tensorflow:latest-gpu command: python -c "import tensorflow as tf;tf.test.gpu_device_name()" deploy: resources: reservations: devices: - capabilities: [gpu]
and
services: test: image: nvidia/cuda:10.2-base command: nvidia-smi deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu]
Both worked fine and detected my GPU
I added the same config to kairon-worker;
kairon-worker
deploy: resources: reservations: devices: - capabilities: [gpu]
but this did not seem to make a difference.
I inquired to check the Dockerfile for the worker, and noticed that there are no packages installed or drivers for the image.
Found some steps here for someone to implement; https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html
The text was updated successfully, but these errors were encountered:
This might be helpful @sfahad1414 https://levelup.gitconnected.com/how-to-install-an-nvidia-gpu-driver-on-an-aws-ec2-instance-20185c1c578c
Sorry, something went wrong.
No branches or pull requests
Hi,
As discussed with Fahad on discord - the kairon worker is using CPU and not GPU inside the docker container.
I ran some tests to make sure it wasn't a problem on my side;
and
Both worked fine and detected my GPU
I added the same config to
kairon-worker
;but this did not seem to make a difference.
I inquired to check the Dockerfile for the worker, and noticed that there are no packages installed or drivers for the image.
Found some steps here for someone to implement;
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl-base.html
The text was updated successfully, but these errors were encountered: