Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Olena Debian/Ubuntu packaging #86

Open
mikegerber opened this issue Aug 14, 2023 · 7 comments
Open

Olena Debian/Ubuntu packaging #86

mikegerber opened this issue Aug 14, 2023 · 7 comments

Comments

@mikegerber
Copy link
Contributor

I've updated the packaging of the Olena builds and now have a GitHub Action that builds the packages for the latest Debian and Ubuntu versions.

grafik

My question to the OCR-D project would be if this is of interest. If so I could make this installable via apt and potentially move the build to the OCR-D/olena.

The main benefit I see is that this decouples compilation of olena from the ocrd_olena install and so saves ~ 1h of install time for ocrd_olena.

(OCR-D/olena has "Issues" disabled but this also fits here as the build is part of ocrd_olena's Makefile)

@mikegerber
Copy link
Contributor Author

@kba @bertsky @stweil What's your opinion on this?

@stweil
Copy link
Collaborator

stweil commented Aug 14, 2023

I think that providing an OCR-D "app store" with installation-ready packages can be really valuable for the OCR-D ecosystem. It should provide Python packages similar to PyPI, and it should of course also provide APT packages for recent Debian and Ubuntu distributions. Olena and Tesseract packages could be built automatically like in your proposal. That would be required only once for new releases and save much build resources. Currently all local and CI builds compile the code which wastes time and also electrical energy.

@mikegerber
Copy link
Contributor Author

I think that providing an OCR-D "app store" with installation-ready packages can be really valuable for the OCR-D ecosystem. It should provide Python packages similar to PyPI, and it should of course also provide APT packages for recent Debian and Ubuntu distributions. Olena and Tesseract packages could be built automatically like in your proposal. That would be required only once for new releases and save much build resources. Currently all local and CI builds compile the code which wastes time and also electrical energy.

These are two somewhat different aspects/scopes:

  1. Packaging the Olena dependency

The one I'm trying to provide a solution for here: Currently, installing ocrd_olena requires building Olena from source, which does take considerable time. For example, the builds in my GitHub Action workflow take ~ 1 hour per flavour. This unnecessary build I'm trying to avoid by providing a package to install. This would be just like installing Tesseract from alex-p's tesseract-ocr PPA, ocrd_tesserocr's does this already.

In fact, I've been using my older package for this for years. I'm just not sure why interest in the pre-built package was low in the OCR-D community. Maybe I should have set up a PPA to make it more attractive/more professional. Or maybe my motivation was higher because I needed to install it much more often due to various container builds I am doing?

I think I am going to provide a PPA/APT repository to make this more palatable, after OCR-D/olena#8 has been merged (fixes the tests, which are part of the package build).

@stweil Side note: I know you have been using ARM(64?) hosts, would you need an ARM64 build as well?

  1. Packaging the OCR-D ecosystem as a whole

I honestly think this needs more thought. The reason is that OCR-D processors need more isolation than normal packages need. There will always be conflicting dependencies and it will simply be impossible to resolve this is in a reasonable way (i.e. without tricks). I would strongly prefer the "slim container" approach and maybe in addition something like AppImage.

AppImage does have the appeal that it has - potentially, I am not super familiar with it yet - the advantages of containers w.r.t. dependency isolation without the downside of being clunky to use.

@mikegerber
Copy link
Contributor Author

I could also try out what is possible with conda. From my limited understanding currently:

  • It should be possible to package ocrd_olena and it's binary dependency olena
  • I am not sure what can be done about conflicting dependencies (e.g. different tensorflow versions)

If the latter cannot be solved I don't think it's worth the time.

@stweil
Copy link
Collaborator

stweil commented Aug 15, 2023

@stweil Side note: I know you have been using ARM(64?) hosts, would you need an ARM64 build as well?

Currently I only use ARM64 on MacOS which does not use APT. I used 32 and 64 bit ARM in the past on Raspberry PI, Odroid and other small computers with Linux, but they are too slow for real work, so I don't see a need for ARM64 builds at the moment (until someone wants to use one of the newer ARM64 servers for OCR-D).

@mikegerber
Copy link
Contributor Author

@stweil Side note: I know you have been using ARM(64?) hosts, would you need an ARM64 build as well?
Currently I only use ARM64 on MacOS which does not use APT. I used 32 and 64 bit ARM in the past on Raspberry PI, Odroid and other small computers with Linux, but they are too slow for real work, so I don't see a need for ARM64 builds at the moment (until someone wants to use one of the newer ARM64 servers for OCR-D).

Ah OK. I'll check how much effort this would need, when I work on this further. Maybe it's just another build to add.

@mikegerber
Copy link
Contributor Author

Just because I didn't explicitly say so: The builds currently live at my fork at https://github.com/mikegerber/olena. When @bertsky is back from his well-deserved vacation I'm going to at least PR the fixes/updates in debian/, and if wanted the additional build workflow.

*.deb are currently only to be found in the build artefacts: https://github.com/mikegerber/olena/actions/runs/5834575220 - I think they expire, so that's not going to be the place to get them from in the future.

The one I use I have saved here: https://qurator-data.de/~mike.gerber/olena_2.1.0+ocrd-git+2-ubuntu22.04/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants