Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd zip bag file group inclusion/exclusion flags are broken (v2.65.0) #1224

Open
MehmedGIT opened this issue May 21, 2024 · 4 comments
Open
Assignees

Comments

@MehmedGIT
Copy link
Contributor

MehmedGIT commented May 21, 2024

I have tried to include only the DEFAULT and the OCR-D-OCR file groups in the zip bag. The error triggered says that the OCR-D-BINPAGE/FILE_0001_OCR-D-BINPAGE.xml file does not exist.

There are potentially 2 bugs:

  1. The file itself exists but is not found
  2. The check is performed although it should not - since that file group was excluded.
ocrd zip bag -d /vd18_data/PPN689276648_39pages -m /vd18_data/PPN689276648_39pages/mets.xml -i PPN689276648 -q DEFAULT -q OCR-D-OCR -j 8
mm@MM-Notebook:/vd18_data$ ocrd zip bag -d /vd18_data/PPN689276648_39pages -m /vd18_data/PPN689276648_39pages/mets.xml -i PPN689276648 -q DEFAULT -q OCR-D-OCR -j 8
13:36:01.006 INFO ocrd.workspace_bagger - Bagging /vd18_data/PPN689276648_39pages to /vd18_data/PPN689276648_39pages.ocrd.zip (temp dir /tmp/ocrd-bagit-za5mu642)
13:36:01.007 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0001_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000001.jpg, local_filename=DEFAULT/FILE_0001_DEFAULT.jpg]/> 
13:36:01.008 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0002_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000002.jpg, local_filename=DEFAULT/FILE_0002_DEFAULT.jpg]/> 
13:36:01.008 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0003_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000003.jpg, local_filename=DEFAULT/FILE_0003_DEFAULT.jpg]/> 
13:36:01.008 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0004_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000004.jpg, local_filename=DEFAULT/FILE_0004_DEFAULT.jpg]/> 
13:36:01.008 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0005_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000005.jpg, local_filename=DEFAULT/FILE_0005_DEFAULT.jpg]/> 
13:36:01.009 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0006_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000006.jpg, local_filename=DEFAULT/FILE_0006_DEFAULT.jpg]/> 
13:36:01.009 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0007_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000007.jpg, local_filename=DEFAULT/FILE_0007_DEFAULT.jpg]/> 
13:36:01.010 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0008_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000008.jpg, local_filename=DEFAULT/FILE_0008_DEFAULT.jpg]/> 
13:36:01.010 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0009_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000009.jpg, local_filename=DEFAULT/FILE_0009_DEFAULT.jpg]/> 
13:36:01.010 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0010_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000010.jpg, local_filename=DEFAULT/FILE_0010_DEFAULT.jpg]/> 
13:36:01.011 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0011_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000011.jpg, local_filename=DEFAULT/FILE_0011_DEFAULT.jpg]/> 
13:36:01.011 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0012_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000012.jpg, local_filename=DEFAULT/FILE_0012_DEFAULT.jpg]/> 
13:36:01.011 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0013_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000013.jpg, local_filename=DEFAULT/FILE_0013_DEFAULT.jpg]/> 
13:36:01.012 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0014_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000014.jpg, local_filename=DEFAULT/FILE_0014_DEFAULT.jpg]/> 
13:36:01.012 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0015_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000015.jpg, local_filename=DEFAULT/FILE_0015_DEFAULT.jpg]/> 
13:36:01.012 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0016_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000016.jpg, local_filename=DEFAULT/FILE_0016_DEFAULT.jpg]/> 
13:36:01.013 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0017_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000017.jpg, local_filename=DEFAULT/FILE_0017_DEFAULT.jpg]/> 
13:36:01.013 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0018_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000018.jpg, local_filename=DEFAULT/FILE_0018_DEFAULT.jpg]/> 
13:36:01.013 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0019_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000019.jpg, local_filename=DEFAULT/FILE_0019_DEFAULT.jpg]/> 
13:36:01.014 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0020_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000020.jpg, local_filename=DEFAULT/FILE_0020_DEFAULT.jpg]/> 
13:36:01.014 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0021_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000021.jpg, local_filename=DEFAULT/FILE_0021_DEFAULT.jpg]/> 
13:36:01.014 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0022_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000022.jpg, local_filename=DEFAULT/FILE_0022_DEFAULT.jpg]/> 
13:36:01.015 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0023_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000023.jpg, local_filename=DEFAULT/FILE_0023_DEFAULT.jpg]/> 
13:36:01.015 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0024_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000024.jpg, local_filename=DEFAULT/FILE_0024_DEFAULT.jpg]/> 
13:36:01.015 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0025_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000025.jpg, local_filename=DEFAULT/FILE_0025_DEFAULT.jpg]/> 
13:36:01.016 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0026_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000026.jpg, local_filename=DEFAULT/FILE_0026_DEFAULT.jpg]/> 
13:36:01.016 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0027_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000027.jpg, local_filename=DEFAULT/FILE_0027_DEFAULT.jpg]/> 
13:36:01.016 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0028_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000028.jpg, local_filename=DEFAULT/FILE_0028_DEFAULT.jpg]/> 
13:36:01.017 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0029_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000029.jpg, local_filename=DEFAULT/FILE_0029_DEFAULT.jpg]/> 
13:36:01.017 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0030_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000030.jpg, local_filename=DEFAULT/FILE_0030_DEFAULT.jpg]/> 
13:36:01.017 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0031_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000031.jpg, local_filename=DEFAULT/FILE_0031_DEFAULT.jpg]/> 
13:36:01.017 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0032_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000032.jpg, local_filename=DEFAULT/FILE_0032_DEFAULT.jpg]/> 
13:36:01.018 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0033_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000033.jpg, local_filename=DEFAULT/FILE_0033_DEFAULT.jpg]/> 
13:36:01.018 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0034_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000034.jpg, local_filename=DEFAULT/FILE_0034_DEFAULT.jpg]/> 
13:36:01.018 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0035_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000035.jpg, local_filename=DEFAULT/FILE_0035_DEFAULT.jpg]/> 
13:36:01.019 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0036_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000036.jpg, local_filename=DEFAULT/FILE_0036_DEFAULT.jpg]/> 
13:36:01.019 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0037_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000037.jpg, local_filename=DEFAULT/FILE_0037_DEFAULT.jpg]/> 
13:36:01.019 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0038_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000038.jpg, local_filename=DEFAULT/FILE_0038_DEFAULT.jpg]/> 
13:36:01.020 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=DEFAULT ID=FILE_0039_DEFAULT, mimetype=image/jpeg, url=https://gdz.sub.uni-goettingen.de/content/PPN689276648/800/0/00000039.jpg, local_filename=DEFAULT/FILE_0039_DEFAULT.jpg]/> 
13:36:01.021 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0004_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0004_OCR-D-OCR.xml]/> 
13:36:01.021 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0001_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0001_OCR-D-OCR.xml]/> 
13:36:01.022 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0003_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0003_OCR-D-OCR.xml]/> 
13:36:01.022 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0002_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0002_OCR-D-OCR.xml]/> 
13:36:01.022 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0005_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0005_OCR-D-OCR.xml]/> 
13:36:01.022 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0006_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0006_OCR-D-OCR.xml]/> 
13:36:01.023 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0007_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0007_OCR-D-OCR.xml]/> 
13:36:01.023 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0008_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0008_OCR-D-OCR.xml]/> 
13:36:01.023 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0010_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0010_OCR-D-OCR.xml]/> 
13:36:01.023 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0009_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0009_OCR-D-OCR.xml]/> 
13:36:01.024 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0011_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0011_OCR-D-OCR.xml]/> 
13:36:01.024 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0012_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0012_OCR-D-OCR.xml]/> 
13:36:01.024 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0013_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0013_OCR-D-OCR.xml]/> 
13:36:01.024 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0014_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0014_OCR-D-OCR.xml]/> 
13:36:01.025 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0015_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0015_OCR-D-OCR.xml]/> 
13:36:01.025 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0016_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0016_OCR-D-OCR.xml]/> 
13:36:01.025 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0019_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0019_OCR-D-OCR.xml]/> 
13:36:01.025 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0017_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0017_OCR-D-OCR.xml]/> 
13:36:01.025 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0018_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0018_OCR-D-OCR.xml]/> 
13:36:01.026 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0020_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0020_OCR-D-OCR.xml]/> 
13:36:01.026 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0021_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0021_OCR-D-OCR.xml]/> 
13:36:01.026 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0022_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0022_OCR-D-OCR.xml]/> 
13:36:01.026 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0023_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0023_OCR-D-OCR.xml]/> 
13:36:01.027 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0024_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0024_OCR-D-OCR.xml]/> 
13:36:01.027 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0028_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0028_OCR-D-OCR.xml]/> 
13:36:01.027 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0025_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0025_OCR-D-OCR.xml]/> 
13:36:01.027 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0026_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0026_OCR-D-OCR.xml]/> 
13:36:01.028 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0027_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0027_OCR-D-OCR.xml]/> 
13:36:01.028 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0034_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0034_OCR-D-OCR.xml]/> 
13:36:01.028 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0030_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0030_OCR-D-OCR.xml]/> 
13:36:01.028 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0032_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0032_OCR-D-OCR.xml]/> 
13:36:01.028 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0031_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0031_OCR-D-OCR.xml]/> 
13:36:01.029 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0029_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0029_OCR-D-OCR.xml]/> 
13:36:01.029 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0033_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0033_OCR-D-OCR.xml]/> 
13:36:01.029 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0035_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0035_OCR-D-OCR.xml]/> 
13:36:01.029 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0036_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0036_OCR-D-OCR.xml]/> 
13:36:01.030 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0039_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0039_OCR-D-OCR.xml]/> 
13:36:01.030 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0037_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0037_OCR-D-OCR.xml]/> 
13:36:01.030 INFO ocrd.workspace_bagger - Bagging OcrdFile <OcrdFile fileGrp=OCR-D-OCR ID=FILE_0038_OCR-D-OCR, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-OCR/FILE_0038_OCR-D-OCR.xml]/> 
Traceback (most recent call last):
  File "/home/mm/venv38-all/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/mm/repos/core/build/__editable__.ocrd-2.65.0-py3-none-any/ocrd/cli/zip.py", line 56, in bag
    workspace_bagger.bag(
  File "/home/mm/repos/core/build/__editable__.ocrd-2.65.0-py3-none-any/ocrd/workspace_bagger.py", line 181, in bag
    total_bytes, total_files = self._bag_mets_files(workspace, bagdir, ocrd_mets, processes, include_fileGrp, exclude_fileGrp)
  File "/home/mm/repos/core/build/__editable__.ocrd-2.65.0-py3-none-any/ocrd/workspace_bagger.py", line 98, in _bag_mets_files
    pcgts = page_from_file(page_file)
  File "/home/mm/repos/core/build/__editable__.ocrd-2.65.0-py3-none-any/ocrd_modelfactory/__init__.py", line 103, in page_from_file
    raise FileNotFoundError("File not found: '%s' (%s)" % (input_file.local_filename, input_file))
FileNotFoundError: File not found: 'OCR-D-BINPAGE/FILE_0001_OCR-D-BINPAGE.xml' (<OcrdFile fileGrp=OCR-D-BINPAGE ID=FILE_0001_OCR-D-BINPAGE, mimetype=application/vnd.prima.page+xml, url=---, local_filename=OCR-D-BINPAGE/FILE_0001_OCR-D-BINPAGE.xml]/> )
Content of the directory:
mm@MM-Notebook:/vd18_data$ ls -la ./PPN689276648_39pages/
total 1144
drwxrwxr-x 13 mm mm    4096 Mai 16 15:43 .
drwxr-xr-x 18 mm mm    4096 Mai 21 13:15 ..
drwxrwxr-x  2 mm mm    4096 Mai 16 12:13 DEFAULT
-rw-rw-r--  1 mm mm 1002007 Mai 21 13:33 mets.xml
drwxrwxr-x  2 mm mm    4096 Mai 16 15:42 OCR-D-BINPAGE
drwxrwxr-x  2 mm mm   12288 Mai 16 15:42 OCR-D-CLIP
drwxrwxr-x  2 mm mm    4096 Mai 16 15:42 OCR-D-DENOISE-OCROPY
drwxrwxr-x  2 mm mm    4096 Mai 16 15:42 OCR-D-DESKEW-OCROPY
drwxrwxr-x  2 mm mm  106496 Mai 16 15:43 OCR-D-DEWARP
-rw-rw-r--  1 mm mm     555 Mai 16 15:43 ocrd.log
drwxrwxr-x  2 mm mm    4096 Mai 16 15:43 OCR-D-OCR
drwxrwxr-x  2 mm mm    4096 Mai 16 15:42 OCR-D-SEG-BLOCK-TESSERACT
drwxrwxr-x  2 mm mm    4096 Mai 16 15:42 OCR-D-SEGMENT-OCROPY
drwxrwxr-x  2 mm mm    4096 Mai 16 15:42 OCR-D-SEGMENT-REPAIR
drwxrwxr-x  2 mm mm    4096 Mai 16 15:42 OCR-D-SEG-PAGE-ANYOCR

mm@MM-Notebook:/vd18_data$ ls ./PPN689276648_39pages/OCR-D-BINPAGE/
FILE_0001_OCR-D-BINPAGE.IMG-BIN.png  FILE_0009_OCR-D-BINPAGE.IMG-BIN.png  FILE_0017_OCR-D-BINPAGE.IMG-BIN.png  FILE_0025_OCR-D-BINPAGE.IMG-BIN.png  FILE_0033_OCR-D-BINPAGE.IMG-BIN.png
FILE_0001_OCR-D-BINPAGE.xml          FILE_0009_OCR-D-BINPAGE.xml          FILE_0017_OCR-D-BINPAGE.xml          FILE_0025_OCR-D-BINPAGE.xml          FILE_0033_OCR-D-BINPAGE.xml
FILE_0002_OCR-D-BINPAGE.IMG-BIN.png  FILE_0010_OCR-D-BINPAGE.IMG-BIN.png  FILE_0018_OCR-D-BINPAGE.IMG-BIN.png  FILE_0026_OCR-D-BINPAGE.IMG-BIN.png  FILE_0034_OCR-D-BINPAGE.IMG-BIN.png
FILE_0002_OCR-D-BINPAGE.xml          FILE_0010_OCR-D-BINPAGE.xml          FILE_0018_OCR-D-BINPAGE.xml          FILE_0026_OCR-D-BINPAGE.xml          FILE_0034_OCR-D-BINPAGE.xml
FILE_0003_OCR-D-BINPAGE.IMG-BIN.png  FILE_0011_OCR-D-BINPAGE.IMG-BIN.png  FILE_0019_OCR-D-BINPAGE.IMG-BIN.png  FILE_0027_OCR-D-BINPAGE.IMG-BIN.png  FILE_0035_OCR-D-BINPAGE.IMG-BIN.png
FILE_0003_OCR-D-BINPAGE.xml          FILE_0011_OCR-D-BINPAGE.xml          FILE_0019_OCR-D-BINPAGE.xml          FILE_0027_OCR-D-BINPAGE.xml          FILE_0035_OCR-D-BINPAGE.xml
FILE_0004_OCR-D-BINPAGE.IMG-BIN.png  FILE_0012_OCR-D-BINPAGE.IMG-BIN.png  FILE_0020_OCR-D-BINPAGE.IMG-BIN.png  FILE_0028_OCR-D-BINPAGE.IMG-BIN.png  FILE_0036_OCR-D-BINPAGE.IMG-BIN.png
FILE_0004_OCR-D-BINPAGE.xml          FILE_0012_OCR-D-BINPAGE.xml          FILE_0020_OCR-D-BINPAGE.xml          FILE_0028_OCR-D-BINPAGE.xml          FILE_0036_OCR-D-BINPAGE.xml
FILE_0005_OCR-D-BINPAGE.IMG-BIN.png  FILE_0013_OCR-D-BINPAGE.IMG-BIN.png  FILE_0021_OCR-D-BINPAGE.IMG-BIN.png  FILE_0029_OCR-D-BINPAGE.IMG-BIN.png  FILE_0037_OCR-D-BINPAGE.IMG-BIN.png
FILE_0005_OCR-D-BINPAGE.xml          FILE_0013_OCR-D-BINPAGE.xml          FILE_0021_OCR-D-BINPAGE.xml          FILE_0029_OCR-D-BINPAGE.xml          FILE_0037_OCR-D-BINPAGE.xml
FILE_0006_OCR-D-BINPAGE.IMG-BIN.png  FILE_0014_OCR-D-BINPAGE.IMG-BIN.png  FILE_0022_OCR-D-BINPAGE.IMG-BIN.png  FILE_0030_OCR-D-BINPAGE.IMG-BIN.png  FILE_0038_OCR-D-BINPAGE.IMG-BIN.png
FILE_0006_OCR-D-BINPAGE.xml          FILE_0014_OCR-D-BINPAGE.xml          FILE_0022_OCR-D-BINPAGE.xml          FILE_0030_OCR-D-BINPAGE.xml          FILE_0038_OCR-D-BINPAGE.xml
FILE_0007_OCR-D-BINPAGE.IMG-BIN.png  FILE_0015_OCR-D-BINPAGE.IMG-BIN.png  FILE_0023_OCR-D-BINPAGE.IMG-BIN.png  FILE_0031_OCR-D-BINPAGE.IMG-BIN.png  FILE_0039_OCR-D-BINPAGE.IMG-BIN.png
FILE_0007_OCR-D-BINPAGE.xml          FILE_0015_OCR-D-BINPAGE.xml          FILE_0023_OCR-D-BINPAGE.xml          FILE_0031_OCR-D-BINPAGE.xml          FILE_0039_OCR-D-BINPAGE.xml
FILE_0008_OCR-D-BINPAGE.IMG-BIN.png  FILE_0016_OCR-D-BINPAGE.IMG-BIN.png  FILE_0024_OCR-D-BINPAGE.IMG-BIN.png  FILE_0032_OCR-D-BINPAGE.IMG-BIN.png
FILE_0008_OCR-D-BINPAGE.xml          FILE_0016_OCR-D-BINPAGE.xml          FILE_0024_OCR-D-BINPAGE.xml          FILE_0032_OCR-D-BINPAGE.xml

The ocrd workspace correctly lists the existing file groups

mm@MM-Notebook:/vd18_data/PPN689276648_39pages$ ocrd workspace list-group
PRESENTATION
MIN
MAX
DEFAULT
THUMBS
OCR-D-BINPAGE
OCR-D-SEG-PAGE-ANYOCR
OCR-D-DENOISE-OCROPY
OCR-D-DESKEW-OCROPY
OCR-D-SEG-BLOCK-TESSERACT
OCR-D-SEGMENT-REPAIR
OCR-D-CLIP
OCR-D-SEGMENT-OCROPY
OCR-D-DEWARP
OCR-D-OCR

I have tried to do the reverse - exclude every group I do not want - but still the same error output.

ocrd zip bag -d /vd18_data/PPN689276648_39pages -m /vd18_data/PPN689276648_39pages/mets.xml -i PPN689276648 -Q MIN -Q MAX -Q PRESENTATION -Q THUMBS -Q OCR-D-BINPAGE -Q OCR-D-DENOISE-OCROPY -Q OCR-D-DEWARP -Q OCR-D-SEGMENT-OCROPY -Q OCR-D-SEG-PAGE-ANYOCR -Q OCR-D-CLIP -Q OCR-D-DESKEW-OCROPY -Q OCR-D-SEG-BLOCK-TESSERACT -Q OCR-D-SEGMENT-REPAIR -j 8

The more interesting part is that if I exclude just the file groups not already existing on the local file system yet (i.e., MIN, MAX, THUMBS or PRESENTATION) that works just fine and the created zip bag is correct.

To reproduce - PPN689276648_39pages.zip

I will investigate and report more if I can detect where it goes wrong in the code.

@bertsky
Copy link
Collaborator

bertsky commented May 21, 2024

Could this be related to #1149 (as internally, the bagger also just uses Resolver.download_to_directory as does clone/workspace_from_url)?

@MehmedGIT
Copy link
Contributor Author

Not sure yet.

@joschrew
Copy link
Contributor

I guess the problem is the mets file. When you exclude filegroups the corresponding files are still present in the mets and thus you get an error when trying to iterate the mets, which is done in the code. I think when excluding, the mets should be regenerated from everything which is to be included. This seems not to be done.
So as a kind of "workaround" the unwanted file groups could be deleted before bagging, instead of excluding them when bagging. This might be not a good workaround though because you cannot simply bag parts of a workspace and simply keep the rest.

@MehmedGIT
Copy link
Contributor Author

So as a kind of "workaround" the unwanted file groups could be deleted before bagging, instead of excluding them when bagging. This might be not a good workaround though because you cannot simply bag parts of a workspace and simply keep the rest.

That may be the only solution actually. Creating a zip bag with a mets file that contains local references to non-existing files in the zip itself (as a result of the exclusion) could cause more problems when the zip is extracted back.

@kba kba self-assigned this Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants