You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the pipeline for input files with the same prefix, the files are combined to one output file. Now I am testing on many-core hardware, it becomes apparent that combining files makes the pipeline much slower, especially in the OCR step. Probably this happens because parallelization across multiple CPU cores cannot be applied.
This is not a problem in itself, but I think it is good to notify users that files with the same prefix will be combined, and will have longer processing time. Or give them a choice to enable/disable combination. Now, a small difference in input, gives a large difference in processing time.
EDIT: I saw there is a "Reassamble PDF" option in the webinterface, but same-prefix files were also combined if I disabled this option.
The text was updated successfully, but these errors were encountered:
When running the pipeline for input files with the same prefix, the files are combined to one output file. Now I am testing on many-core hardware, it becomes apparent that combining files makes the pipeline much slower, especially in the OCR step. Probably this happens because parallelization across multiple CPU cores cannot be applied.
This is not a problem in itself, but I think it is good to notify users that files with the same prefix will be combined, and will have longer processing time. Or give them a choice to enable/disable combination. Now, a small difference in input, gives a large difference in processing time.
EDIT: I saw there is a "Reassamble PDF" option in the webinterface, but same-prefix files were also combined if I disabled this option.
The text was updated successfully, but these errors were encountered: