Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calamari2 #118

Open
wants to merge 56 commits into
base: master
Choose a base branch
from
Open

Calamari2 #118

wants to merge 56 commits into from

Conversation

bertsky
Copy link
Contributor

@bertsky bertsky commented Sep 23, 2024

(also based on #117)

@mikegerber
Copy link
Collaborator

Are the licensing issues with Calamari 2.x resolved?

@bertsky
Copy link
Contributor Author

bertsky commented Sep 27, 2024

CI failure: ValueError: bad marshal data (unknown type code) – this is wild! Happens on Python 3.8, where the models still used to work, and I cannot reproduce locally. We can go for SavedModel format (model version 5→6) which has been merged in Calamari master (but not on PyPI), but I feel I am missing something here...

Are the licensing issues with Calamari 2.x resolved?

Like I said elsewhere: we have to go for GPL3 all the way, all we need is your consent here (and @andbue's and @chreul's for Calamari itself).

@bertsky bertsky marked this pull request as ready for review September 27, 2024 08:52
@bertsky
Copy link
Contributor Author

bertsky commented Sep 27, 2024

use C2 deep3_fraktur19 model for testing 260f184

note: I will still try to convert your qurator-gt4histocr-1.0 model, but at the moment this is blocked by Calamari-OCR/calamari#362

@mikegerber
Copy link
Collaborator

mikegerber commented Sep 27, 2024

Like I said elsewhere: we have to go for GPL3 all the way, all we need is your consent here (and @andbue's and @chreul's for Calamari itself).

Yeah OK. Just so we are on the same page: until then we can't merge this IMHO.

IIUC, it's also may not be possible for Calamari to go GPL3 so easily, see first comments of Calamari-OCR/calamari#3.

@bertsky
Copy link
Contributor Author

bertsky commented Sep 27, 2024

IIUC, it's also may not be possible for Calamari to go GPL3 so easily, see first comments of Calamari-OCR/calamari#3.

Like I said there, I disagree with that assessment. To the extent that Calamari borrows from Ocropy it shared the same license, but with Calamari 2 all the machinery was implemented by tfaip (i.e. rewritten). The only shared components are things like the line dewarper (aka. center normalizer) which have to be included merely to provide backwards data compatibility, and could be argued to be part of the models (data), not the runtime (code).

Yeah OK. Just so we are on the same page: until then we can't merge this IMHO.

I'll try to get them to do it for C2.

So can I change the license here within this PR?

@bertsky
Copy link
Contributor Author

bertsky commented Sep 27, 2024

Oh, btw, this PR does still not achieve what we want in terms of GPU efficiency – we are still as peaky as in #116 even with core v3's page parallelism. I'm afraid we have to switch from multithreading to multiprocessing model in core...

ocrd-calamari-cuda-b12-B12-v3-p6-c2-metscaching-bg1-pp-listresult-shmem

(this is with 6 parallel pages – no benefit at all over single-page processing!)

@bertsky
Copy link
Contributor Author

bertsky commented Sep 27, 2024

CI failure: ValueError: bad marshal data (unknown type code) – this is wild! Happens on Python 3.8, where the models still used to work, and I cannot reproduce locally. We can go for SavedModel format (model version 5→6) which has been merged in Calamari master (but not on PyPI), but I feel I am missing something here...

Since we have to move in that direction anyway, I converted everything in calamari_models and calamari_models_experimental and created new releases with all the tarballs as assets, and referenced them here. Now we just need a new release 2.3 of Calamari master on PyPI to make that work with our CI...

@bertsky
Copy link
Contributor Author

bertsky commented Oct 19, 2024

For some reason, CircleCI does not trigger anymore (and I have no permission to). I did fix it with the last commit, though – see CI on my branch.

BTW, there's a branch looming which I will eventually merge here: https://github.com/bertsky/ocrd_calamari/tree/calamari2-subprocess – that depends on bertsky/core#23 and finally solves our performance bottleneck. I'll open the PR and document everything as soon as I can add the dependency on the core version.

@bertsky
Copy link
Contributor Author

bertsky commented Nov 12, 2024

So for comparison:

  • same as before (predict_pipeline), but with multiprocessing instead of multithreading:
    ocrd-calamari-cuda-b3-BB-v3-processpool-p3-c2-metscaching-predictpipeline

  • the same with predict_on_batch:
    ocrd-calamari-cuda-b3-BB-v3-processpool-p3-c2-metscaching-predictonbatch

  • the same in a continous (infinite) generator only set up once and fed via queue; batches need to be filled up with empty images as long as any consumers remain
    ocrd-calamari-cuda-b12-BB-v3-processpool-p3-c2-metscaching-predictonbatch-flowing-idle-lock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants