Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comprehensive validation of XML files #6367

Open
solth opened this issue Jan 16, 2025 · 1 comment
Open

Comprehensive validation of XML files #6367

solth opened this issue Jan 16, 2025 · 1 comment
Labels
development fund 2025 A candidate for the Kitodo e.V. development fund.

Comments

@solth
Copy link
Member

solth commented Jan 16, 2025

Description

Kitodo.Production works with many differenct XML files: rulesets, metadata mappings and configuration files are created and changed by the institutions working with Kitodo. Additionally there are external files like the imported metadata files in MODS, MARCXML or EAD format.

This makes the application somewhat vulnerable to mistakes made outside of Kitodo itself, because all of these files can contain malformed XML, may not adhere to the corresponding schema definitions or be not readable at all. Since the functionalities of Kitodo rely heavily on these XML files being correct, it would be very benificial to introduce a set of validations that have to be performed on all of them whenever they are processed:

  • check if file exists -> otherwise: show File XY does not exist error message
  • check if file is readable -> otherwise: show File XY is not readable error message
  • check if file contains wellformed XML -> otherwise: show File does not contain well-formed XML error message
  • check if contained file is valid against corresponding XML schema definition (for example Kitodo ruleset schema): otherwise: show File does not contain valid Kitodo ruleset XML error message
    • in case of XML responses from search interfaces like SRU or OAI, check, if response contains valid SRU/OAI container XML -> otherwise: show Response does not contain valid SRU XML

Where applicable, specific detected errors should also be listed (e.g. "opening XML tag is not closed" for XML checks or "Field XYZ is not allowed in MODS" for schema validation errors), ideally in the error message in the frontend, but at least in the Kitodo log files.

Some of these checks are already peformed at some points in the system (for example, when opening a process in the metadata editor, an error message is shown when the file could not be found), but many other XML files are not validated before usage (for example XML files imported from external sources during process creation is not validated against the schema definition of the metadata format configured in the corresponding ImportConfiguration, yet, and thus processing this imported data with - unvalidated - XSL mapping files sometimes fails because they do not actually contain the expected valid MODS or MACRXML; similarily ruleset files are not validated against ruleset.xsd before saving them or applying them during process creation or in the metadata editor).

Related Issue

#5877

Expected Benefits of this Development

Many problems we encountered over the years in Kitodo stem from external or internal XML files not containing the expected, valid contents, so introducing more checks and adding corresponding feedback to the GUI will harden the application, prevent malformed or invalid XML files and therefor also support and simplify problem investigation and solving.

Estimated Costs and Complexity

  • medium ~ around 5 - 7 working days
@solth solth added the development fund 2025 A candidate for the Kitodo e.V. development fund. label Jan 16, 2025
@solth solth changed the title Comprehensive Validation of XML files Comprehensive validation of XML files Jan 16, 2025
@henning-gerhardt
Copy link
Collaborator

Maybe even #6193 should be considered while working out the solution for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development fund 2025 A candidate for the Kitodo e.V. development fund.
Projects
None yet
Development

No branches or pull requests

2 participants