Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolve ONNX on GPU issue #951

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

saeid-a
Copy link

@saeid-a saeid-a commented Dec 9, 2023

Greetings,
Image used: redislabs/redisai:1.2.7-gpu-bionic

During my testing of RedisAI, I found that when running ONNX models, GPU could not be used, and every attempt failed. RedisAI would use the CPU instead.
When adding a model with con.modelset('test', 'ONNX', 'GPU', m1) RedisAI produced an error as bellow.

redisai-redisai-1  | 2023-12-09 14:01:51.062001467 [E:onnxruntime:RedisAI, provider_bridge_ort.cc:964 Ensure] Failed to load library libonnxruntime_providers_shared.so with error: libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory

Looking at the container and the libraries located there I get this:

root@32f5fdc976e8:/# du -h /var/opt/redislabs/artifacts/redisai-gpu-onnxruntime.linux-bionic-x64.1.2.7.tgz
5.6M	/var/opt/redislabs/artifacts/redisai-gpu-onnxruntime.linux-bionic-x64.1.2.7.tgz

After analyzing the code and performing some debugging, I discovered that get_deps.sh script downloads onnxruntime binaries from https://s3.amazonaws.com/redismodules/onnxruntime. Specifically, it fetches the file https://s3.amazonaws.com/redismodules/onnxruntime/onnxruntime-linux-x64-gpu-1.11.1.tgz, which has a size of approximately 5MB. I noticed that this file size is significantly smaller than the official GPU release for ONNX Runtime available at https://github.com/microsoft/onnxruntime/releases/download/v1.11.1/onnxruntime-linux-x64-gpu-1.11.1.tgz, which exceeds 100MB in size.

To make the RedisAI container run on GPU, I replaced the onnxruntime files by voluming the onnx directory. After this change, the container was able to run on GPU.

Based on my investigation, it appears that the root cause of the issue was the wrong onnxruntime file uploaded to the RedisAI AWS host. To fix this, I changed the ORT_URL_BASE in $OS == linux to download from onnxruntime's Github page. This change enabled the container to run on GPU for ONNX models properly.

here's a PR with the required changes.

@CLAassistant
Copy link

CLAassistant commented Dec 9, 2023

CLA assistant check
All committers have signed the CLA.

@saeid-a saeid-a changed the title download official onnx releases resolve ONNX on GPU issue Dec 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants