Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails at parallel execution of tasks, executing createDocuments #143

Open
Funnybanny opened this issue Feb 21, 2019 · 9 comments
Open

Fails at parallel execution of tasks, executing createDocuments #143

Funnybanny opened this issue Feb 21, 2019 · 9 comments

Comments

@Funnybanny
Copy link

Funnybanny commented Feb 21, 2019

Im trying to make a setup where i can give a list of entities (that hold all the necessary information to do OCR on .tiff files). For this i use Spring and i use Threadpoolexecutor to execute my tasks in parallel.

Enviroment: win10, Java, Spring Framework
Executor: FixedThreadpool
tess4j version: 4.3.1

Error messages: (there are multiple ones because it gives sometimes different error or just works, so here are my findings)

  • splitter_.orig_pix():Error:Assert failed:in file ..\..\src\ccmain\tesseractclass.cpp, line 674 This is the most common one, I can replicate this

  • !w_it.cycled_list():Error:Assert failed:in file ..\..\src\ccstruct\pageres.cpp, line 1351 I couldnt replicate this

  • HIGHlol1 LOWlol Page 1 Page 1 Detected 224 diacritics Didn't fail OCR is done let's move! tmp\lol1.pdf -> C:\Users\kh\Desktop\workstuff\samples\test_out\lol1.pdf: The process cannot access the file because it is being used by another process. C:\Users\kh\Desktop\workstuff\samples\test_out\lol1.pdf [Fatal Error] :1:167: The markup in the document following the root element must be well-formed. C:\Users\kh\Desktop\workstuff\samples\test_out\lol1.pdf Here is the context for this output. HIGHlol1 means high priority and the file is named lol1(.tiff). Page 1 and Detected 224 diacritics are standard tesseract outputs as far as i know. Didn't fail means it did not throw any tesseract exceptions (never got one btw.). OCR is done lets move! means i managed to tell database that we finished OCR on file. After that, the program fails to move pdf file from tmp folder, which is the intended folder for creating pdf via tess4j. After this I dont know what the error means, but it closes the application, meaning it wont even try to do ocr on the second .tiff file called lol.

To sum it up these are the errors i get when i try to execute tasks (specific tasks that execute tess4j's createDocuments) parallel.

@Funnybanny Funnybanny changed the title Fails at parallel execution of tasks, executing doOCR Fails at parallel execution of tasks, executing createDocuments Feb 21, 2019
@nguyenq
Copy link
Owner

nguyenq commented Feb 24, 2019

createDocuments does call TessDeleteResultRenderer to release the resources.

I ran testCreateDocuments unit test case with the output pdf file deleted or moved at the end and had no problem.

File outputFile = new File(outputbase2 + ".pdf");
assertTrue(outputFile.exists());
//        File target = new File("C:\\Temp\\out.pdf");
//        outputFile.renameTo(target);
boolean success = outputFile.delete();
assertFalse(outputFile.exists());

@Funnybanny
Copy link
Author

Funnybanny commented Feb 25, 2019

okey how does that relate to this problem? What you said, seems to me, has to do with my other post about not releasing pdf after OCR. This post is about those errors above.

@Funnybanny
Copy link
Author

Funnybanny commented Feb 25, 2019

Now i see, because one of those errors is about not being able to access it you thought it was about it, but in fact i state that i get all those 3 erros so pls answer accordingly. I have a work around for not being able to access produced pdf files and it worked, until these errors started to show up with parallel execution. I have a work around for these errors too, but its ugly and would like to know what is exactly wrong with my Tesseract Instance so i can make a proper solution.

@Funnybanny
Copy link
Author

createDocuments does call TessDeleteResultRenderer to release the resources.

I ran testCreateDocuments unit test case with the output pdf file deleted or moved at the end and had no problem.

File outputFile = new File(outputbase2 + ".pdf");
assertTrue(outputFile.exists());
//        File target = new File("C:\\Temp\\out.pdf");
//        outputFile.renameTo(target);
boolean success = outputFile.delete();
assertFalse(outputFile.exists());

yes you can move them delete them and all. The only thing you cant do is open them manually, but if you move them after creation they are fine and you can do whatever with them.

@nguyenq
Copy link
Owner

nguyenq commented Feb 25, 2019

OK, I misread your issues. You may want to post at Tesseract forum as the exceptions occurred from inside the native code. Hope someone with insight will respond.

@Funnybanny
Copy link
Author

OK, I misread your issues. You may want to post at Tesseract forum as the exceptions occurred from inside the native code. Hope someone with insight will respond.

Will do and thx

@nguyenq
Copy link
Owner

nguyenq commented Feb 26, 2019

Would you experience these issues if executing in single thread?

@Funnybanny
Copy link
Author

Would you experience these issues if executing in single thread?

No, only in Multithreaded solution

@htmldoug
Copy link

I had similar issues. Seems like it's not thread-safe.

Creating one Tesseract instance per operation seems to work for me as a workaround. Pooling them should also work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants