Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectDetectionCoral stops working when restarting the docker container. #7

Open
1 of 6 tasks
mikegleasonjr opened this issue Apr 11, 2024 · 12 comments
Open
1 of 6 tasks
Assignees
Labels
bug Something isn't working

Comments

@mikegleasonjr
Copy link

Area of Concern

  • Server
  • Behaviour of ObjectDetectionCoral
  • Installer
  • Runtime [e.g. Python3.7, .NET]
  • Module packages [e.g. PyTorch)
  • Something else

Describe the bug

  1. I am running a docker container with the codeproject/ai-server image with 2 volumes (so that I can restart it without having to reinstall/reconfigure everything):
  • /etc/codeproject/ai
  • /app/modules
  1. I installed the Object Detection (Coral) module, which worked fine with my TPU. I called the API and verified that everything was working.
  2. I stopped the container
  3. I started back the container
  4. The module now has this error: objectdetection_coral_adapter.py: An exception occurred initialising the module: libedgetpu.so.1: cannot open shared object file: No such file or dir

Expected behavior
The module restarts properly in the docker container.

I know what the problem actually is: In ObjectDetectionCoral/install.sh, there is a line to copy the shared library in /usr/lib/x86_64-linux-gnu/ upon module installation:

cp "${moduleDirPath}/edgetpu_runtime/${edgetpu_folder}/k8/libedgetpu.so.1.0" "/usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0"

Obvisouly when restarting the container, this change is lost since I do not mount /usr/lib/x86_64-linux-gnu/ as a volume to be persistent across "reboots".

I don't know if modules have a startup hook where these steps of copying the shared object and running of ldconfig could be done there?

Screenshots
N/A

Your System (please complete the following information):

  • CodeProject.AI Server version: codeproject/ai-server:2.6.2
  • Object Detection (Coral) version: 2.2.0

Additional context
See Expected behavior

@ChrisMaunder ChrisMaunder added the bug Something isn't working label Apr 18, 2024
@matthewDDennis
Copy link

How are you starting the Docker container?

  • if a command line Docker run, add --restart=always. This might fix things, but might not.

@mikegleasonjr
Copy link
Author

mikegleasonjr commented Apr 19, 2024

Hi @matthewDDennis, thanks for the help. It is my feeling that the bug report wasn't fully understood though. The bug is actually known and described in the "Expected behavior" section (at least I tried to explain it). This isn't about Docker not restarting the container.

Please feel free to ask questions if it is still misunderstood 🙏🏻

@mikegleasonjr
Copy link
Author

Maybe the ObjectDetectionCoral module should redo some steps it does upon installation in the initialise phase of the module: https://www.codeproject.com/ai/docs/devguide/module_examples/add_python_module.html#writing-the-module

It copies files into the host at the location: /usr/lib/x86_64-linux-gnu/. Those files are obviously lost when restarting the container.

There is no mention about mounting other volumes other than the settings/module folders in the documentation: https://www.codeproject.com/ai/docs/install/running_in_docker.html#advanced-docker-launch-settings-saved-outside-of-the-container

@galperinm
Copy link

@mikegleasonjr How did you end up working around this? Did you have to mount usr/lib/x86_64-linux-gnu/ as a persistent volume?

@mikegleasonjr
Copy link
Author

mikegleasonjr commented Sep 6, 2024

Mounting usr/lib/x86_64-linux-gnu/ as a persistent volume would mean going through hoops to first get the files from the image before the module installation since the directory would be first empty on the host (and then empty in the running container).

I felt the comprehensive bug report wasn't even read according to the comments I had so I felt helpless trying to make ppl understand what was happening and a waste of efforts to go to such an extent to file a bug report. I stopped using the module (and codeprojectai).

So yes I gave an example where some installation files were lost but I don't know if it involves a game of wack-a-mole and if something else would still be missing after a container restart. I think so, I think some apt install are being made upon installation that would be lost upon container restart.

Also the official doc says to mount the Docker image in a certain way so this would contradict what has to be done here to restore the context.

@galperinm
Copy link

Thanks for the update @mikegleasonjr. It's a shame this isn't under active development anymore given that such a major bug is unacknowledged half a year later, guess I'll need to move off of codeproject.ai as well. Surprising, I can't imagine you and I are the only two Coral + codeprojectai docker users out there.

If you feel like sharing, I'd love to know what you moved to. My intention was to use this with Frigate and Doubletake for facial recognition supporting Coral. Looks like the alternatives are dead too-- Deepstack is no longer being developed either (docker image last updated over 2 years ago), and Compreface likewise has almost a year since last update.

@ChrisMaunder ChrisMaunder transferred this issue from codeproject/CodeProject.AI-Server Nov 22, 2024
@matthewDDennis
Copy link

matthewDDennis commented Nov 22, 2024

Installing the latest version of the Server, 2.9.0, should resolve the issue.
Since 2.8.0 the Server will check to see if the container Id has changed and rerun the install scripts to ensure all the required files and libs are initialized.

In the case of the Docker image, you need to pull the appropriate codeproject/ai-server image that matches your CPU/GPU configuration. The latest image is 2.8.0, but a 2.9.0 should be available soon.

@cpfarhood
Copy link

I'm still seeing this challenge as of 2.9.5

@ChrisMaunder
Copy link
Collaborator

@mikegleasonjr You're correct in pointing out where the issue is. The fix Matthew made was to re-run the install.sh script when the server detects that the container it's running under isn't the original container. This should then re-run the

cp "${moduleDirPath}/edgetpu_runtime/${edgetpu_folder}/k8/libedgetpu.so.1.0" "/usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0"

command and ensure the library is in the correct place.

Just to clarify some things: you say you stopped and restarted the container. Does that mean you just stopped and restarted, or does it mean you deleted and recreated the container? I'm assuming you mean just "stop and start".

Could you try opening a terminal into the container and checking if /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0 exists? If not could you please try going to the /app/modules/ObjectDetectionCoral folder and running

bash ../../setup.sh

and then checking if /usr/lib/x86_64-linux-gnu/libedgetpu.so.1.0 exists after the re-install?

@cpfarhood
Copy link

New container with module installed
root@aiserver-coral-2:/usr/lib/x86_64-linux-gnu# ls tpu
libedgetpu.so.1 libedgetpu.so.1.0

After killing the container
root@aiserver-coral-2:/usr/lib/x86_64-linux-gnu# ls tpu
ls: cannot access 'tpu': No such file or directory

After running setup.sh
root@aiserver-coral-2:/usr/lib/x86_64-linux-gnu# ls tpu
libedgetpu.so.1 libedgetpu.so.1.0

So how to force this to happen when needed?

@cpfarhood
Copy link

Installing the latest version of the Server, 2.9.0, should resolve the issue. Since 2.8.0 the Server will check to see if the container Id has changed and rerun the install scripts to ensure all the required files and libs are initialized.

In the case of the Docker image, you need to pull the appropriate codeproject/ai-server image that matches your CPU/GPU configuration. The latest image is 2.8.0, but a 2.9.0 should be available soon.

I'm using kubernetes not docker, is it possible the container id is not changing? I'm using a statefulset so the name is always the same on purpose.

@ChrisMaunder
Copy link
Collaborator

That could be it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants