You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried to include only the DEFAULT and the OCR-D-OCR file groups in the zip bag. The error triggered says that the OCR-D-BINPAGE/FILE_0001_OCR-D-BINPAGE.xml file does not exist.
There are potentially 2 bugs:
The file itself exists but is not found
The check is performed although it should not - since that file group was excluded.
ocrd zip bag -d /vd18_data/PPN689276648_39pages -m /vd18_data/PPN689276648_39pages/mets.xml -i PPN689276648 -q DEFAULT -q OCR-D-OCR -j 8
mm@MM-Notebook:/vd18_data$ ls -la ./PPN689276648_39pages/
total 1144
drwxrwxr-x 13 mm mm 4096 Mai 1615:43.
drwxr-xr-x 18 mm mm 4096 Mai 2113:15 ..
drwxrwxr-x 2 mm mm 4096 Mai 1612:13 DEFAULT
-rw-rw-r--1 mm mm 1002007 Mai 21 13:33 mets.xml
drwxrwxr-x 2 mm mm 4096 Mai 1615:42 OCR-D-BINPAGE
drwxrwxr-x 2 mm mm 12288 Mai 1615:42 OCR-D-CLIP
drwxrwxr-x 2 mm mm 4096 Mai 1615:42 OCR-D-DENOISE-OCROPY
drwxrwxr-x 2 mm mm 4096 Mai 1615:42 OCR-D-DESKEW-OCROPY
drwxrwxr-x 2 mm mm 106496 Mai 1615:43 OCR-D-DEWARP
-rw-rw-r--1 mm mm 555 Mai 16 15:43 ocrd.log
drwxrwxr-x 2 mm mm 4096 Mai 1615:43 OCR-D-OCR
drwxrwxr-x 2 mm mm 4096 Mai 1615:42 OCR-D-SEG-BLOCK-TESSERACT
drwxrwxr-x 2 mm mm 4096 Mai 1615:42 OCR-D-SEGMENT-OCROPY
drwxrwxr-x 2 mm mm 4096 Mai 1615:42 OCR-D-SEGMENT-REPAIR
drwxrwxr-x 2 mm mm 4096 Mai 1615:42 OCR-D-SEG-PAGE-ANYOCR
mm@MM-Notebook:/vd18_data$ ls ./PPN689276648_39pages/OCR-D-BINPAGE/
FILE_0001_OCR-D-BINPAGE.IMG-BIN.png FILE_0009_OCR-D-BINPAGE.IMG-BIN.png FILE_0017_OCR-D-BINPAGE.IMG-BIN.png FILE_0025_OCR-D-BINPAGE.IMG-BIN.png FILE_0033_OCR-D-BINPAGE.IMG-BIN.png
FILE_0001_OCR-D-BINPAGE.xml FILE_0009_OCR-D-BINPAGE.xml FILE_0017_OCR-D-BINPAGE.xml FILE_0025_OCR-D-BINPAGE.xml FILE_0033_OCR-D-BINPAGE.xml
FILE_0002_OCR-D-BINPAGE.IMG-BIN.png FILE_0010_OCR-D-BINPAGE.IMG-BIN.png FILE_0018_OCR-D-BINPAGE.IMG-BIN.png FILE_0026_OCR-D-BINPAGE.IMG-BIN.png FILE_0034_OCR-D-BINPAGE.IMG-BIN.png
FILE_0002_OCR-D-BINPAGE.xml FILE_0010_OCR-D-BINPAGE.xml FILE_0018_OCR-D-BINPAGE.xml FILE_0026_OCR-D-BINPAGE.xml FILE_0034_OCR-D-BINPAGE.xml
FILE_0003_OCR-D-BINPAGE.IMG-BIN.png FILE_0011_OCR-D-BINPAGE.IMG-BIN.png FILE_0019_OCR-D-BINPAGE.IMG-BIN.png FILE_0027_OCR-D-BINPAGE.IMG-BIN.png FILE_0035_OCR-D-BINPAGE.IMG-BIN.png
FILE_0003_OCR-D-BINPAGE.xml FILE_0011_OCR-D-BINPAGE.xml FILE_0019_OCR-D-BINPAGE.xml FILE_0027_OCR-D-BINPAGE.xml FILE_0035_OCR-D-BINPAGE.xml
FILE_0004_OCR-D-BINPAGE.IMG-BIN.png FILE_0012_OCR-D-BINPAGE.IMG-BIN.png FILE_0020_OCR-D-BINPAGE.IMG-BIN.png FILE_0028_OCR-D-BINPAGE.IMG-BIN.png FILE_0036_OCR-D-BINPAGE.IMG-BIN.png
FILE_0004_OCR-D-BINPAGE.xml FILE_0012_OCR-D-BINPAGE.xml FILE_0020_OCR-D-BINPAGE.xml FILE_0028_OCR-D-BINPAGE.xml FILE_0036_OCR-D-BINPAGE.xml
FILE_0005_OCR-D-BINPAGE.IMG-BIN.png FILE_0013_OCR-D-BINPAGE.IMG-BIN.png FILE_0021_OCR-D-BINPAGE.IMG-BIN.png FILE_0029_OCR-D-BINPAGE.IMG-BIN.png FILE_0037_OCR-D-BINPAGE.IMG-BIN.png
FILE_0005_OCR-D-BINPAGE.xml FILE_0013_OCR-D-BINPAGE.xml FILE_0021_OCR-D-BINPAGE.xml FILE_0029_OCR-D-BINPAGE.xml FILE_0037_OCR-D-BINPAGE.xml
FILE_0006_OCR-D-BINPAGE.IMG-BIN.png FILE_0014_OCR-D-BINPAGE.IMG-BIN.png FILE_0022_OCR-D-BINPAGE.IMG-BIN.png FILE_0030_OCR-D-BINPAGE.IMG-BIN.png FILE_0038_OCR-D-BINPAGE.IMG-BIN.png
FILE_0006_OCR-D-BINPAGE.xml FILE_0014_OCR-D-BINPAGE.xml FILE_0022_OCR-D-BINPAGE.xml FILE_0030_OCR-D-BINPAGE.xml FILE_0038_OCR-D-BINPAGE.xml
FILE_0007_OCR-D-BINPAGE.IMG-BIN.png FILE_0015_OCR-D-BINPAGE.IMG-BIN.png FILE_0023_OCR-D-BINPAGE.IMG-BIN.png FILE_0031_OCR-D-BINPAGE.IMG-BIN.png FILE_0039_OCR-D-BINPAGE.IMG-BIN.png
FILE_0007_OCR-D-BINPAGE.xml FILE_0015_OCR-D-BINPAGE.xml FILE_0023_OCR-D-BINPAGE.xml FILE_0031_OCR-D-BINPAGE.xml FILE_0039_OCR-D-BINPAGE.xml
FILE_0008_OCR-D-BINPAGE.IMG-BIN.png FILE_0016_OCR-D-BINPAGE.IMG-BIN.png FILE_0024_OCR-D-BINPAGE.IMG-BIN.png FILE_0032_OCR-D-BINPAGE.IMG-BIN.png
FILE_0008_OCR-D-BINPAGE.xml FILE_0016_OCR-D-BINPAGE.xml FILE_0024_OCR-D-BINPAGE.xml FILE_0032_OCR-D-BINPAGE.xml
The ocrd workspace correctly lists the existing file groups
mm@MM-Notebook:/vd18_data/PPN689276648_39pages$ ocrd workspace list-group
PRESENTATION
MIN
MAX
DEFAULT
THUMBS
OCR-D-BINPAGE
OCR-D-SEG-PAGE-ANYOCR
OCR-D-DENOISE-OCROPY
OCR-D-DESKEW-OCROPY
OCR-D-SEG-BLOCK-TESSERACT
OCR-D-SEGMENT-REPAIR
OCR-D-CLIP
OCR-D-SEGMENT-OCROPY
OCR-D-DEWARP
OCR-D-OCR
I have tried to do the reverse - exclude every group I do not want - but still the same error output.
ocrd zip bag -d /vd18_data/PPN689276648_39pages -m /vd18_data/PPN689276648_39pages/mets.xml -i PPN689276648 -Q MIN -Q MAX -Q PRESENTATION -Q THUMBS -Q OCR-D-BINPAGE -Q OCR-D-DENOISE-OCROPY -Q OCR-D-DEWARP -Q OCR-D-SEGMENT-OCROPY -Q OCR-D-SEG-PAGE-ANYOCR -Q OCR-D-CLIP -Q OCR-D-DESKEW-OCROPY -Q OCR-D-SEG-BLOCK-TESSERACT -Q OCR-D-SEGMENT-REPAIR -j 8
The more interesting part is that if I exclude just the file groups not already existing on the local file system yet (i.e., MIN, MAX, THUMBS or PRESENTATION) that works just fine and the created zip bag is correct.
I guess the problem is the mets file. When you exclude filegroups the corresponding files are still present in the mets and thus you get an error when trying to iterate the mets, which is done in the code. I think when excluding, the mets should be regenerated from everything which is to be included. This seems not to be done.
So as a kind of "workaround" the unwanted file groups could be deleted before bagging, instead of excluding them when bagging. This might be not a good workaround though because you cannot simply bag parts of a workspace and simply keep the rest.
So as a kind of "workaround" the unwanted file groups could be deleted before bagging, instead of excluding them when bagging. This might be not a good workaround though because you cannot simply bag parts of a workspace and simply keep the rest.
That may be the only solution actually. Creating a zip bag with a mets file that contains local references to non-existing files in the zip itself (as a result of the exclusion) could cause more problems when the zip is extracted back.
I have tried to include only the
DEFAULT
and theOCR-D-OCR
file groups in the zip bag. The error triggered says that theOCR-D-BINPAGE/FILE_0001_OCR-D-BINPAGE.xml
file does not exist.There are potentially 2 bugs:
ocrd zip bag -d /vd18_data/PPN689276648_39pages -m /vd18_data/PPN689276648_39pages/mets.xml -i PPN689276648 -q DEFAULT -q OCR-D-OCR -j 8
Content of the directory:
The ocrd workspace correctly lists the existing file groups
I have tried to do the reverse - exclude every group I do not want - but still the same error output.
The more interesting part is that if I exclude just the file groups not already existing on the local file system yet (i.e.,
MIN
,MAX
,THUMBS
orPRESENTATION
) that works just fine and the created zip bag is correct.To reproduce - PPN689276648_39pages.zip
I will investigate and report more if I can detect where it goes wrong in the code.
The text was updated successfully, but these errors were encountered: