-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update base image to python:3.9 #660
base: main
Are you sure you want to change the base?
Conversation
fix failing test (random ordering of dates) fix and update tesserocr installation
Not against it, just curious: Was there a particular reason for this change?
Looking at the Dockerfile and history, wasn’t that the case before, too? (Or am I misunderstanding and you researched whether we might be able to switch to prebuilt binaries and the conclusion was that we still cannot?) |
My thinking here was that the official images will make it easier to upgrade down the line. Also because not all python versions are packaged on all distros, so we might have had to resort to PPAs and such.
Sorry I didn't fully explain it, but it's the latter. I was hoping that we can skip this exception of building tesserocr ourselves, but it looks like we can't and that's fine, I guess. |
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \ | ||
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8 | ||
RUN echo "deb http://http.us.debian.org/debian stable non-free" >/etc/apt/sources.list.d/nonfree.list \ | ||
&& apt-get -qq -y update \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the respective PR in the Aleph repo, here’s a list of packages that should already be installed in the base image:
ca-certificates
git
pkg-config
libxslt1-dev
libpq-dev
zlib1g-dev
libicu-dev
libxml2-dev
imagemagick
imagemagick-common
(requires onlyimagemagick-6-common
which is installed by default)libtiff5-dev
(requires onlylibtiff-dev
which is installed by default)libjpeg-dev
libfreetype6-dev
libwebp-dev
libopenjp2-7-dev
libpng-dev
fonts-dejavu-core
@@ -12,7 +12,7 @@ def test_audio(self): | |||
self.assertEqual(entity.first("generator"), "com.apple.VoiceMemos (iOS 11.4)") | |||
self.assertEqual( | |||
entity.first("authoredAt"), | |||
datetime.datetime(2018, 6, 20, 12, 9, 42).isoformat(), | |||
datetime.datetime(2018, 6, 20, 12, 9, 28).isoformat(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit surprising, what’s the backstory here?
spacy==3.6.1 | ||
numpy<2.0.0 # pinned because otherwise spacy requires an incompatible numpy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference because I was curious what this is about: explosion/spaCy#13528
Seems like the issue is fixed in the latest release, so as soon as we upgrade spacy to 3.8.2 or later, this isn’t needed anymore. (Was able to confirm that the build succeeds with that version.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to update spacy here and now, thanks!
Makes sense, thanks for clarifying! One thing we might want to consider is to pin the Debian version, e.g. |
ubuntu:20.04
topython:3.9
, which is Debian based.tesserocr
from source because otherwise we are not able to parse some important image file formats.