Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add resmgr server with fastapi/uvicorn #1294

Open
bertsky opened this issue Nov 12, 2024 · 4 comments
Open

add resmgr server with fastapi/uvicorn #1294

bertsky opened this issue Nov 12, 2024 · 4 comments

Comments

@bertsky
Copy link
Collaborator

bertsky commented Nov 12, 2024

Elaborating a bit on option 2: of course, the (generated) docker-compose.yml for each module could also provide an additional server entry point – a simple REST API wrapper for resmgr CLI. Its (generated) volume and variable config would have to match the respective Processing Worker (or Processor Server) to be used. But the local resmgr would not need to "know" anything beyond what it can see in its thin container – a local ocrd-all-tool.json and ocrd-all-module-dir.json precomputed for the processors of that module at build time, plus the filesystem in that container and mounted volumes.

In addition, to get the same central resmgr user experience (for all processor executables at the same time), one would still need

  • either a single server (with resmgr-like endpoints or even providing some /discovery/processor/resources) which delegates to the individual resmgr servers,
  • or an intelligent resmgr client doing the same.

Regardless, crucially, this central component needs to know about all the deployed resmgr services – essentially holding a mapping from processor executables to module resmgr server host-port pairs. This could be generated along with the docker-compose.yml (in a new format like ocrd-all-module-dir.json), or the latter even gets parsed directly.

Originally posted by @bertsky in OCR-D/ocrd_all#69 (comment)

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 12, 2024

Implementation could borrow heavily from ocrd.mets_server and ocrd.cli.workspace.workspace_serve_cli, with endpoints 1:1 providing ocrd.cli.resmgr commands.

Could you please take this on @joschrew?

@joschrew
Copy link
Contributor

joschrew commented Nov 12, 2024

I don't think I need this solution for the slim containers. When resolving resources /usr/local/share/ocrd-resources/ is considered in most cases. For tesserocr-recognize I have to use TESSDATA_PREFIX to provide the path. So when starting the processing-workers I volume- mount my host-local modules-directory to /usr/local/share/ocrd-resources/. This way the processor should always find the resources. Downloading the resources is a bit complex in the host (for example tesserocr-recognize refuses to download to /usr/local/share/ocrd-resources), but this only has to be done once.

My problem with all the Resource-Manager stuff is, that it is very complex. To do something like what we have done for the Mets-Server seems to be too much, because there already is a (nearly) working solution. I would rather change the resource-manager to be able to download all desired resources to a configurable directory. Check if desired resource is already there, if not download it. Imo the problem with the current solution is that it wants to be flexible and smart. It would be easier if it would just download all to /usr/local/share/ocrd-resources. Additionally TESSDATA_PREFIX should always be set to /usr/local/share/ocrd-resources/ocrd-tesserocr-recognize. In this case for example it would just be possible to mount a directory to processing-server and the workers to /usr/local/share/ocrd-resources. And then in the processing server just the Resource-Manager has to be called to download to the shared folder.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 12, 2024

@joschrew please read my comprehensive analysis on the resmgr problem for context. We are not anywhere near any working solution at the moment.

This is not about whether or how a processor can resolve (or locate) requested resources (i.e. models) at runtime.

It is about the functionality of the ocrd resmgr CLI made available within the thin container regime, i.e. (for all processors)

  • listing installed models,
  • listing additional models available for download, and
  • downloading models.

I just gave the METS Server example because it contains a simple FastAPI + Uvicorn structure that you can borrow from. (Of course, the same can be found in the Processing Server, but there it is spread across multiple modules.)

@joschrew
Copy link
Contributor

I basically just can repeat myself, I already tried to understand what was written in the linked issue you mentioned.

This is not about whether or how a processor can resolve (or locate) requested resources (i.e. models) at runtime.

The ocrd resmgr goal is in the end to make the resources available. So from my point of view the central point of this is exactly about how a processor can reach/resolve its resources, that's what the resmgr is in the end responsible to resolve.

And my opinion is to throw away the resmgr or at least how its currently used. Regarding the linked issue I go with Future solution 1. and not 2. (not sure though if I understand all of it).

What I have in mind is this: ocrd resmgr is called with a path to a directory. ocrd resmgr then downloads resources to this directory. It can list what is already in this directory and what resources are available online. Then the processors get a function like the --dump-module-dir function, for example called --show-resources-dir. With this they show where they expect there resources to be (this should be made configurable).

With both of this (ocrd resmgr able to download to an arbitrary directory, and the processor being able to show where it expects its resources) an error like a processor aborting with an error like "hey user, I cannot find my resources" can be resolved. And this is finally what this is all about, at least how I see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants