-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calamari2 #118
base: master
Are you sure you want to change the base?
Calamari2 #118
Conversation
- show tfaip version, too - avoid module-level tfaip/tensorflow imports - use Calamari 2 API for parameterization, loading and processing - initialise and run predictor in single bg thread only - use length bucketing for batching (but w/ correct batch size calculation) - map back to lines by passing line ID into sample metadata
Are the licensing issues with Calamari 2.x resolved? |
CI failure:
Like I said elsewhere: we have to go for GPL3 all the way, all we need is your consent here (and @andbue's and @chreul's for Calamari itself). |
note: I will still try to convert your |
Yeah OK. Just so we are on the same page: until then we can't merge this IMHO. IIUC, it's also may not be possible for Calamari to go GPL3 so easily, see first comments of Calamari-OCR/calamari#3. |
Like I said there, I disagree with that assessment. To the extent that Calamari borrows from Ocropy it shared the same license, but with Calamari 2 all the machinery was implemented by tfaip (i.e. rewritten). The only shared components are things like the line dewarper (aka. center normalizer) which have to be included merely to provide backwards data compatibility, and could be argued to be part of the models (data), not the runtime (code).
I'll try to get them to do it for C2. So can I change the license here within this PR? |
Oh, btw, this PR does still not achieve what we want in terms of GPU efficiency – we are still as peaky as in #116 even with core v3's page parallelism. I'm afraid we have to switch from multithreading to multiprocessing model in core... (this is with 6 parallel pages – no benefit at all over single-page processing!) |
Since we have to move in that direction anyway, I converted everything in calamari_models and calamari_models_experimental and created new releases with all the tarballs as assets, and referenced them here. Now we just need a new release 2.3 of Calamari master on PyPI to make that work with our CI... |
For some reason, CircleCI does not trigger anymore (and I have no permission to). I did fix it with the last commit, though – see CI on my branch. BTW, there's a branch looming which I will eventually merge here: https://github.com/bertsky/ocrd_calamari/tree/calamari2-subprocess – that depends on bertsky/core#23 and finally solves our performance bottleneck. I'll open the PR and document everything as soon as I can add the dependency on the core version. |
…hread per line to send task and receive result
(also based on #117)