add resmgr server with fastapi/uvicorn #1294

bertsky · 2024-11-12T10:05:14Z

Elaborating a bit on option 2: of course, the (generated) docker-compose.yml for each module could also provide an additional server entry point – a simple REST API wrapper for resmgr CLI. Its (generated) volume and variable config would have to match the respective Processing Worker (or Processor Server) to be used. But the local resmgr would not need to "know" anything beyond what it can see in its thin container – a local ocrd-all-tool.json and ocrd-all-module-dir.json precomputed for the processors of that module at build time, plus the filesystem in that container and mounted volumes.

In addition, to get the same central resmgr user experience (for all processor executables at the same time), one would still need

either a single server (with resmgr-like endpoints or even providing some /discovery/processor/resources) which delegates to the individual resmgr servers,

or an intelligent resmgr client doing the same.

Regardless, crucially, this central component needs to know about all the deployed resmgr services – essentially holding a mapping from processor executables to module resmgr server host-port pairs. This could be generated along with the docker-compose.yml (in a new format like ocrd-all-module-dir.json), or the latter even gets parsed directly.

Originally posted by @bertsky in OCR-D/ocrd_all#69 (comment)

The text was updated successfully, but these errors were encountered:

bertsky · 2024-11-12T10:09:38Z

Implementation could borrow heavily from ocrd.mets_server and ocrd.cli.workspace.workspace_serve_cli, with endpoints 1:1 providing ocrd.cli.resmgr commands.

Could you please take this on @joschrew?

joschrew · 2024-11-12T15:22:27Z

I don't think I need this solution for the slim containers. When resolving resources /usr/local/share/ocrd-resources/ is considered in most cases. For tesserocr-recognize I have to use TESSDATA_PREFIX to provide the path. So when starting the processing-workers I volume- mount my host-local modules-directory to /usr/local/share/ocrd-resources/. This way the processor should always find the resources. Downloading the resources is a bit complex in the host (for example tesserocr-recognize refuses to download to /usr/local/share/ocrd-resources), but this only has to be done once.

My problem with all the Resource-Manager stuff is, that it is very complex. To do something like what we have done for the Mets-Server seems to be too much, because there already is a (nearly) working solution. I would rather change the resource-manager to be able to download all desired resources to a configurable directory. Check if desired resource is already there, if not download it. Imo the problem with the current solution is that it wants to be flexible and smart. It would be easier if it would just download all to /usr/local/share/ocrd-resources. Additionally TESSDATA_PREFIX should always be set to /usr/local/share/ocrd-resources/ocrd-tesserocr-recognize. In this case for example it would just be possible to mount a directory to processing-server and the workers to /usr/local/share/ocrd-resources. And then in the processing server just the Resource-Manager has to be called to download to the shared folder.

bertsky · 2024-11-12T23:50:14Z

@joschrew please read my comprehensive analysis on the resmgr problem for context. We are not anywhere near any working solution at the moment.

This is not about whether or how a processor can resolve (or locate) requested resources (i.e. models) at runtime.

It is about the functionality of the ocrd resmgr CLI made available within the thin container regime, i.e. (for all processors)

listing installed models,
listing additional models available for download, and
downloading models.

I just gave the METS Server example because it contains a simple FastAPI + Uvicorn structure that you can borrow from. (Of course, the same can be found in the Processing Server, but there it is spread across multiple modules.)

joschrew · 2024-11-15T16:08:56Z

I basically just can repeat myself, I already tried to understand what was written in the linked issue you mentioned.

This is not about whether or how a processor can resolve (or locate) requested resources (i.e. models) at runtime.

The ocrd resmgr goal is in the end to make the resources available. So from my point of view the central point of this is exactly about how a processor can reach/resolve its resources, that's what the resmgr is in the end responsible to resolve.

And my opinion is to throw away the resmgr or at least how its currently used. Regarding the linked issue I go with Future solution 1. and not 2. (not sure though if I understand all of it).

What I have in mind is this: ocrd resmgr is called with a path to a directory. ocrd resmgr then downloads resources to this directory. It can list what is already in this directory and what resources are available online. Then the processors get a function like the --dump-module-dir function, for example called --show-resources-dir. With this they show where they expect there resources to be (this should be made configurable).

With both of this (ocrd resmgr able to download to an arbitrary directory, and the processor being able to show where it expects its resources) an error like a processor aborting with an error like "hey user, I cannot find my resources" can be resolved. And this is finally what this is all about, at least how I see it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add resmgr server with fastapi/uvicorn #1294

add resmgr server with fastapi/uvicorn #1294

bertsky commented Nov 12, 2024 •

edited

Loading

bertsky commented Nov 12, 2024

joschrew commented Nov 12, 2024 •

edited

Loading

bertsky commented Nov 12, 2024

joschrew commented Nov 15, 2024

add resmgr server with fastapi/uvicorn #1294

add resmgr server with fastapi/uvicorn #1294

Comments

bertsky commented Nov 12, 2024 • edited Loading

bertsky commented Nov 12, 2024

joschrew commented Nov 12, 2024 • edited Loading

bertsky commented Nov 12, 2024

joschrew commented Nov 15, 2024

bertsky commented Nov 12, 2024 •

edited

Loading

joschrew commented Nov 12, 2024 •

edited

Loading