Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sch:pattern/@documents="#SVRL" to allow marking/grading/summarization on patterns in validation results #74

Open
rjelliffe opened this issue May 31, 2024 · 0 comments

Comments

@rjelliffe
Copy link
Member

rjelliffe commented May 31, 2024

(Added: In my Schematron users meeting presentation [Prague 2024] I identified this as proposal as one of the most important IMHO.)

This is a much simpler proposal as an alternative to #16 (chaining phases with progressive visibility of prior SVRL)

###Use Cases

  • I am an educator. I have given my students a 300 question exam with single- and multiple- choice questions. I use Schematron to mark each answer. But I have no way to aggregate the scores or find patterns in them, because the results of validation are not visible within Schematron.

  • I am a dashboard developer for some industrial process. I use Schematron to detect and report complex patterns in the process. I want to further detect what the patterns tell me about the state of the system, e.g. if there are more than 10 failed assertions of a serious type.

  • I am a legal publisher who ingests case reports made by 100 different courts in OOXML, ODF and RTF. Within each court, there are different data entry operators who use different conventions willy nilly. Some use styesheets. Some use tables to format a title page for each case. I use Schematron to find patterns to let me do "feature extraction" on the document. But I want to detect outliers as well as group features to allow dispatching to appropriate processes. And I prefer if it is all in one place (file).

  • I want to write a validation that is more like a Hidden Markov model (but not): there is one set of detectors that look at what is found in the document, then another set that operates on that sequence of detected things to figure out which transition to take.

Problem

Validation results are not visible in Schematron. Therefore you need to have a second pass, involving a shell script, XProc etc. This is not convenient, and severely limits Schematron. My experience of XProc is that, while it works, it is at least as complex as Schematron and so, even if you system is tooled-up for it, can easily be overkill.
Also top-level parameters and variables made from the original document are not visible in downstream processes.

Proposal

Allow sch:pattern/@documents="#SVRL" to invoke a map-reduce operating mode.
Other patterns run as normal and generate SVRL. The SVRL is then validated by these special patterns. The resulting SVRL is the validation result, or could merged with the first stage's SVRL at implementer option.
The same scoping rules apply as for other @documents: top-level params are visible as are any top-level variables (i.e., sch:schema/sch:let) which continue to be evaluated on the original document.

Discussion

There is obvious scope to turn Schematron phases into some state machine, where one pattern enables another: it is a nice geeky thought. Similarly to make phases or patterns more like XProc processes that can chain.
However, it seems to me that this is overkill and complexifying, when what would be more usable is to allow Schematron to act in a "map reduce" fashion: the original validation is the "map" and this proposed second pass is the "reduce".
Rather than learn and install some pipeline system, there is no schema changes to Schematron in this propasal: just one special value that conceptually fits with the current definitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant