-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails at parallel execution of tasks, executing createDocuments #143
Comments
I ran
|
okey how does that relate to this problem? What you said, seems to me, has to do with my other post about not releasing pdf after OCR. This post is about those errors above. |
Now i see, because one of those errors is about not being able to access it you thought it was about it, but in fact i state that i get all those 3 erros so pls answer accordingly. I have a work around for not being able to access produced pdf files and it worked, until these errors started to show up with parallel execution. I have a work around for these errors too, but its ugly and would like to know what is exactly wrong with my Tesseract Instance so i can make a proper solution. |
yes you can move them delete them and all. The only thing you cant do is open them manually, but if you move them after creation they are fine and you can do whatever with them. |
OK, I misread your issues. You may want to post at Tesseract forum as the exceptions occurred from inside the native code. Hope someone with insight will respond. |
Will do and thx |
Would you experience these issues if executing in single thread? |
No, only in Multithreaded solution |
I had similar issues. Seems like it's not thread-safe. Creating one Tesseract instance per operation seems to work for me as a workaround. Pooling them should also work. |
Im trying to make a setup where i can give a list of entities (that hold all the necessary information to do OCR on .tiff files). For this i use Spring and i use Threadpoolexecutor to execute my tasks in parallel.
Enviroment: win10, Java, Spring Framework
Executor: FixedThreadpool
tess4j version: 4.3.1
Error messages: (there are multiple ones because it gives sometimes different error or just works, so here are my findings)
splitter_.orig_pix():Error:Assert failed:in file ..\..\src\ccmain\tesseractclass.cpp, line 674
This is the most common one, I can replicate this!w_it.cycled_list():Error:Assert failed:in file ..\..\src\ccstruct\pageres.cpp, line 1351
I couldnt replicate thisHIGHlol1 LOWlol Page 1 Page 1 Detected 224 diacritics Didn't fail OCR is done let's move! tmp\lol1.pdf -> C:\Users\kh\Desktop\workstuff\samples\test_out\lol1.pdf: The process cannot access the file because it is being used by another process. C:\Users\kh\Desktop\workstuff\samples\test_out\lol1.pdf [Fatal Error] :1:167: The markup in the document following the root element must be well-formed. C:\Users\kh\Desktop\workstuff\samples\test_out\lol1.pdf
Here is the context for this output. HIGHlol1 means high priority and the file is named lol1(.tiff). Page 1 and Detected 224 diacritics are standard tesseract outputs as far as i know. Didn't fail means it did not throw any tesseract exceptions (never got one btw.). OCR is done lets move! means i managed to tell database that we finished OCR on file. After that, the program fails to move pdf file from tmp folder, which is the intended folder for creating pdf via tess4j. After this I dont know what the error means, but it closes the application, meaning it wont even try to do ocr on the second .tiff file called lol.To sum it up these are the errors i get when i try to execute tasks (specific tasks that execute tess4j's createDocuments) parallel.
The text was updated successfully, but these errors were encountered: