Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suffix option (-o) deleting input file extension for output file name, when segmenting AND reconizing a directory #680

Open
PierreYvesJallud opened this issue Jan 22, 2025 · 4 comments

Comments

@PierreYvesJallud
Copy link

Hi all,

My environment:

  • kraken --version: kraken, version 5.3.0
  • python --version: Python 3.11.2
  • /etc/os-release :
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

I'm not sure it's a bug, but when I use kraken with the following conditions, the input file extension is deleted:
find pathToImgs/*.jpg | parallel kraken -I {} -o .ocr.txt segment -i mySegmentModel.mlmodel -bl ocr -m myReconizingModel.mlmodel

If the input files look like myImgFile.jpg, the reconizing output files look like myImgFile .ocr.txt (without .jpg)...
I would have expected they look like myImgFile .jpg.ocr.txt

When I proceed for a single input file, the result is good:
kraken -i myImgFile.jpg -o .ocr.txt segment mySegmentModel.mlmodel -bl ocr -m myReconizingModel.mlmodel

The result file is myImgFile.jpg.ocr.txt. The jpg extension is retained.

Is there an explanation? Did I make a mistake with the options 🤔?...

Greetings

@mittagessen
Copy link
Owner

mittagessen commented Jan 22, 2025 via email

@PierreYvesJallud
Copy link
Author

Well 🙄... this combination of parameters has been suggested in the eScriptorium forum). Except the problem of suffix, it works perfectly 😎

That's not a real obstacle for my work. I already have to modify the ALTO files (name and fileName) to integrate the result in eScriptorium and that's pretty simple with a little script. So... you be the judge =)

@mittagessen
Copy link
Owner

mittagessen commented Jan 23, 2025 via email

@PierreYvesJallud
Copy link
Author

And yet... it turns ✨🪐✨ 🤓!
I just checked the script and I haven't made a mistake when copying the line.
If you want more informations about my environment, just ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants