Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realize an empty publication date if METS header is absent instead of failing with a Python error #46

Closed
tboenig opened this issue Jun 22, 2020 · 6 comments · Fixed by #56
Assignees
Labels
bug Something isn't working

Comments

@tboenig
Copy link

tboenig commented Jun 22, 2020

Hi @wrznr,

I use your program with data from sbb.
Here an example:
mm2tei -o "https://oai.sbb.berlin/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:digital.staatsbibliothek-berlin.de:PPN66438790X" >test.tei.xml

A other example from sub goettingen
mm2tei -o "https://gdz.sub.uni-goettingen.de/mets/PPN228873541.mets.xml" >test.tei.xml
Here we find the same ssl problem.

Is the ssl problem a problem on ssb side or a problem in your program?

@wrznr
Copy link
Member

wrznr commented Jun 22, 2020

Hi @tboenig, could you pls. post some kind of error message to make it easier to get an idea of the error?

@tboenig
Copy link
Author

tboenig commented Jun 22, 2020

here the ssb error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/urllib/request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/usr/lib/python3.6/http/client.py", line 1415, in connect
    server_hostname=server_hostname)
  File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/usr/lib/python3.6/ssl.py", line 817, in __init__
    self.do_handshake()
  File "/usr/lib/python3.6/ssl.py", line 1077, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 27, in cli
    f = urlopen(mets)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mets-mods2tei/env/bin/mm2tei", line 8, in <module>
    sys.exit(cli())
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 29, in cli
    f = open(mets, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'https://oai.sbb.berlin/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:digital.staatsbibliothek-berlin.de:PPN66438790X'

@tboenig
Copy link
Author

tboenig commented Jun 22, 2020

and here the sub goettingen error:
sorry is not the same ssl error

Traceback (most recent call last):
  File "mets-mods2tei/env/bin/mm2tei", line 8, in <module>
    sys.exit(cli())
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 35, in cli
    mets.fromfile(f)
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/api/mets.py", line 112, in fromfile
    self.__spur()
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/api/mets.py", line 233, in __spur
    self.encoding_date = header.get_CREATEDATE().isoformat()
AttributeError: 'NoneType' object has no attribute 'get_CREATEDATE'

@wrznr
Copy link
Member

wrznr commented Jun 22, 2020

The former problem is most likely a problem at the host (SBB) or your own institution. Sorry.

The latter problem is caused by the missing metsHdr element in the METS file you want to process (cf. https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263). The METS file from Göttingen contains no information when it was created. But such information is mandatory for valid DTABf. If you have ideas on how to fix this, I will gladly implement them.

@wrznr wrznr added enhancement New feature or request help wanted Extra attention is needed labels Jun 22, 2020
@tboenig
Copy link
Author

tboenig commented Jun 22, 2020

Hi @wrznr,

If you have ideas how to fix it, I will be happy to implement them.
my suggestion:

  • ignore the empty or missing metsHdr and make an empty <date type="publication"/> or an error message on cli, i.e. the mets file is not valid. I think a combination would be ideal.

@wrznr wrznr changed the title ssl problem Realize an empty publication date if METS header is absent instead of failing with a Python error Jun 22, 2020
@wrznr wrznr added bug Something isn't working and removed enhancement New feature or request help wanted Extra attention is needed labels Jun 22, 2020
@wrznr wrznr self-assigned this Jun 22, 2020
@bertsky
Copy link
Member

bertsky commented Dec 3, 2021

@tboenig I have difficulty implementing these fallbacks/error signals for missing headers, because I cannot find exact documentation of DTAbf and TEI proper.

For example, one of the dependent elements of metsHdr is the mets:agent, which is used for encodingDesc:

self.encoding_desc = list(filter(lambda x: x.get_OTHERTYPE() == "SOFTWARE", header.get_agent()))[0].get_name()

(I don't know why we throw away all but the first agent and all but its name, but granted.)

This information usually ends up in simple p elements:

encoding_desc = self.tree.xpath('//tei:encodingDesc', namespaces=ns)[0]
encoding_desc_details = etree.SubElement(encoding_desc, "%sp" % TEI)
encoding_desc_details.text = "Encoded with the help of %s." % creator

Now, according to DTAbf there is supposed to be an intermittent editorialDecl here. But the only reference I can find on that is in the (IIUC) Examples schema.

So what is the correct representation here, and what should I put in as a fallback in case the metsHdr is missing?

@bertsky bertsky mentioned this issue Dec 3, 2021
@bertsky bertsky linked a pull request Dec 3, 2021 that will close this issue
@wrznr wrznr closed this as completed in #56 Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants