-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add postconditions for variables and contexts: @post-condition #73
Comments
I see this proposal as relating to two slightly different, but related, things: unit testing and exception handling. There are mature testing frameworks, such as XSpec, where this kind of thing can already be accommodated. It's good practice and probably better for the programmer to amass a set of test cases that cause exceptions to be raised. If you are worried about the exception handling provided by the implementation you are using, you can write your own function and handle exceptions (differently - perhaps more gracefully) there. XSpec can also test if your functions are working correctly, of course. |
I think it relates to more than unit testing and exceptions.
1. Unit testing is based on coarse tests where you run a canned
representative example to get expected results: after the first run ,
their aim becomes not so much proving "does this work" as detecting "have I
broken something that worked before?". Unit tests work well in scenarios
where there is little variation or surprise or combinatorial explosion in
inputs, and are strictly test-time things. In contrast, post-conditions
are useful in the opposite situation, where the variety of input means you
need to do either exhaustive tests (testing internal invariants rather than
exteral units) or to never disable the tests until after the system is
mature (what the QA people call "quality-in--use". )
And post-conditions work at a different scale than unit tests, e.g. at
variable scale. The external "unit" of Schematron is the assertion:
variables or contexts cannot be checked, except by adding assertions for
the purpose. Which then means you need to put in place a mechanism to
shield these assertion fails from the user, or to turn them on and off.
To me, just as you would not say (in general) that we dont need Schematron
when we can use unit tests, I think we cannot say (in general) that
post-conditions can be replaced by unit-tests over tricky Schematron
schemas.
As I mention, when you have combinatorial explosion (e.g the standard case
of rich text such as legal publishing) a large set of test cases gives a
false sense of security. I have worked on several systems where we had to
test against all previous inputs (tens of thousands of documents) and even
then would find uncoped-with scenarios in the next incoming set if
documents. (Typically where a new data source had bern integrated way up
the line, perhaps in a different country by a different team.)
2. I don't think that writing functions to cope with exceptions may be
always workable, because it introduces complication in the very place where
the exceptions are: instead of a standard method that an IDE can integrate,
every schema is potentially different. And there are certainly developers
who understand XPaths well enough, but not function definition: Schematron
needs to provide a value-add over XSLT to be viable.
My original thought was indeed to provide a function safe-number() which
would provide more possibilties for handling NaN exceptions, but it risked
being a band-aid. That being said, there might be some better approach to
exceptions: which are in particular file-not-found exceptions and NaN
exceptions, in my experience.
But I think we do need to check post-conditions as close to the variable
declaration as possible. We want to easily see what the developer decided
was not necessary to cope with when they wrote their XPaths: that they
believed the value of one variable would have the same number of items as
the value of the variable it was using as an input, for example.
For the syntax: I thought of allowing sch:let/sch:assert instead, only
tested when the variable was used: it is fine by me too, but I thought
@post-condition was less intrusive. Using an element or attribute here is
not critical.
@as goes some way (even though it is perhaps more really needed for
type coercion or to prevent taking of values) but it does not cope with
co-occurrence constraints, which @post-condition does.
To put it another way, wherever any language is used for mission-critical
operations with any complexity (either complex processing or
widely-varying inputs) you need to add redundant checks at the level of
granularity of the risk.
Regards
Rick
…On Tuesday, May 14, 2024, Andrew Sales ***@***.***> wrote:
I see this proposal as relating to two slightly different, but related,
things: unit testing and exception handling.
There are mature testing frameworks, such as XSpec
<https://github.com/xspec/xspec/wiki>, where this kind of thing can
already be accommodated. It's good practice and probably better for the
programmer to amass a set of test cases that cause exceptions to be raised.
If you are worried about the exception handling provided by the
implementation you are using, you can write your own function and handle
exceptions (differently - perhaps more gracefully) there. XSpec can also
test if your functions are working correctly, of course.
—
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKORWPG2CCMQQBA7243ZCEAUFAVCNFSM6AAAAABHOFLEWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBYGUZDOMRQG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
Then we disagree fundamentally about the purpose of unit testing.
Me too, and I continue to. It is the way of things, which this proposal can't change. I don't think what you propose is a bad idea, just that it doesn't solve the problem. I think it is a problem in any case that can only be mitigated.
Well, there is |
A unit test says "given some specific input X expect specific output Y".
An assertion says "for every possible A, some invariant B should hold."
Not the same things.
An assertion simplifies coding and understanding by carving off situations
that are not expected to occur. A unit test is a sanity check that some
function has produced a plausible result .
Rick
…On Tue, 14 May 2024, 04:46 Andrew Sales, ***@***.***> wrote:
Then we disagree fundamentally about the purpose of unit testing.
The idea of disabling the tests once the system is mature alarms me, since
in my day job I am dealing with unpredictable, human-authored input that
can vary greatly. We write our last test when we have fixed the last bug.
I have worked on several systems where we had to
test against all previous inputs (tens of thousands of documents) and even
then would find uncoped-with scenarios in the next incoming set if
documents.
Me too, and I continue to. It is the way of things, which this proposal
can't change.
I don't think what you propose is a bad idea, just that it doesn't solve
the problem. I think it is a problem in any case that can only be mitigated.
You open with the challenges of a complex XPath, but that post-condition
XPath is only going to get more complex as it needs to accommodate more
scenarios. It would provide a sense of security no more true than
corresponding unit tests would.
I would, as I say, address this with additional test cases to describe
unforeseen scenarios as they arise, and amend the schema to reflect them as
needed.
This is a partial fix for the problem that XPath functions can generate
exceptions, but Schematron has no mechanism to cope.
Well, there is if...then...else... error(...) approach, but the standard
discourages the use of error(). Perhaps we need some runtime linkage that
does allow user-defined exceptions to be handled by the implementation and
consistently reported as SVRL...
—
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKKOQBLEY7FBI2EBQSLZCIPSPAVCNFSM6AAAAABHOFLEWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJQGQZTONZZGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
I'm well aware of the difference. As I said above, I don't think this kind of assertion addresses the issue of unpredictable input. Assertions in other languages can typically be enabled or disabled at execution time, and if enabled, will often halt processing. If we do have assertions, I think implementations ought to be configurable in this respect. A common case I've come across is a runtime error where an atomic value was expected by a function, but a sequence was passed instead. This can occur also e.g. in message construction, with I think it would be good to refine the expected behaviour and prospective reporting of errors, if this is to be standardised. I'd be interested in input from the wider community about this as a feature. XML Prague and the Schematron Users Meetup are around the corner, which is one suitable forum. |
On Wed, 15 May 2024, 04:52 Andrew Sales, ***@***.***> wrote
As I said above, I don't think this kind of assertion addresses the issue
of unpredictable input.
It probably depends on the kind of unpredictability, a la Donald Rumsfeld.
But I still don't understand Andrew's point, sorry, unless he is saying a
developer using this may not cover all cases, or be a matter of discipline:
that's life, isn't it?
Assertions in other languages can typically be enabled or disabled at
execution time, and if enabled, will often halt processing. If we do have
assertions, I think implementations ought to be configurable in this
respect.
Certainly.
A common case I've come across is a runtime error where an atomic value
was expected by a function, but a sequence was passed instead. This can
occur also e.g. in message construction, with <value-of/>. Would we want
assertions in such places too?
I think having post-conditions on sch:let and sch:rule is enough, but if
there was utility in having it elsewhere, I dont see it would do harm. In
the case of sch:value-of, someone might want pre-conditions as well as
post-conditions.
Static type checking of signatures within Xpaths, like OxygenXml does, is
a different issue, I think. That is where @as connects better.
I think it would be good to refine the expected behaviour and prospective
reporting of errors, if this is to be standardised.
To an extent, yes. But the initial target users would be IDE and pipeline
integrators (xproc, ant, etc).
As Andrew mentioned, it might be that @post-condition or @Assert should not
be a simple boolean, but e.g. allow error(). Or even allow it to
generates a string as failure:
````
@post-condition="if(*) then true()
else 'Programming assumption not met: no child named-nodes."
````
I'd be interested in input from the wider community about this as a
feature. XML Prague <http://xmlprague.cz> and the Schematron Users Meetup
are around the corner, which is one suitable forum.
Good idea.
Rick
…
Message ID:
***@***.***
com>
|
Another approach, with different wins, would be to allow a new optional
attribute on assertions: @to which is the subsystem or role that should be
informed.
E.g.
<sch:assert test=" parent::robin"
to="log" severity="fatal"
role="developer_expectation">Oh dear</sch:assert>
Above, the assertion failure is logged, as well as going to SVRL.
<sch:report test="preceding-sibling::coco"
to="mailto:[email protected]" severity="info"
role="possible-version-violation">Found a preceding-sibling coco, most
likely this is old data that may be sourced in error.</sch:report>
Above, the assertion text gets emailed.
<sch:assert test=" count($input-paras) eq count($output-paras)"
to="svrl"
severity="error" role="issue-for-devops">
Above, the assertion failure just goes to the SVRL (overriding any schr:rule/@to). Optional.
So the @to can have strings to integrate into the workflow. The SVRL also
gets @to.
Simple validity continues as no unsuccessful assertions or succeeded
reports, regardless of @Severity and @to.
I would provide some reserved @to's:
log - message goes to log file
error - message goes to std err
out - message goes to std out (console)
svrl - (default)
owner - owner of process
user - human
mail:x - mail
http:x - send to the URL in argument (ab)using HTTP GET
post:x - send to URL x using HTTP POST - reserved
put:x - send to URL x using HTTP PUT - reserved
AFAIK XSLT dos not allow PUT and PUSH, and just has GET (e.g. document()).
So a @to with "http:..." would send the assertion failure to as an argument to
GET, and the SVRL would include something from the HTTP response.
This allows server-based interaction that bypasses the SVRL, but maintains
an audit trail in the SVRL that the notification was received by the server.
Rick
…On Mon, 13 May 2024, 08:26 Andrew Sales, ***@***.***> wrote:
I see this proposal as relating to two slightly different, but related,
things: unit testing and exception handling.
There are mature testing frameworks, such as XSpec
<https://github.com/xspec/xspec/wiki>, where this kind of thing can
already be accommodated. It's good practice and probably better for the
programmer to amass a set of test cases that cause exceptions to be raised.
If you are worried about the exception handling provided by the
implementation you are using, you can write your own function and handle
exceptions (differently - perhaps more gracefully) there. XSpec can also
test if your functions are working correctly, of course.
—
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKORWPG2CCMQQBA7243ZCEAUFAVCNFSM6AAAAABHOFLEWKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBYGUZDOMRQG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***
com>
|
I mean that the perceived utility of an assert will run out quickly for all but the most simple cases and most predictable input. A real example from just the other day. I was testing out a new rule which worked in isolation but threw a divide-by-zero error when incorporated into the target schema. The cause was my test cases were toys that omitted otherwise required structures. The IDE was able to take me to the point of failure in the XSLT for the compiled schema. Would an assert have helped me? Possibly, but I found the cause from my IDE anyway. Would I have wanted to put one everywhere in a sizeable schema where division was used? Probably not. If real-world input had caused this, I'd've added an extra condition to the relevant XPath and moved on. Here, I adjusted my test cases and moved on.
Not static: dynamic. Using
It would be critically important to be clear how this feature would affect validity, if it all. I'm not asking for all of this information here and now, I am just noting the need. |
I suggest that using 'assertion' as an unqualified term here risks confusion with
IME, the [programming language] assertions that can be disabled at execution time tend to be put just before the ordinary code that tries to do something reasonable with the same invalid value (e.g., return early with a null value) so that the developer gets the rude shock where the problem occurs and the user gets let down gently. (I'm not sure how well the reasonable return value idiom translates to Schematron, where there aren't visible return values as such, just the presence or absence of messages from It seems to me that @rjelliffe wants the Schematron to (also) be able to deliver the rude shocks, while @AndrewSales would (mostly) leave it to the unit tests. Plus, I think there's general agreement that there will always be one more bug when some user somewhere tries something unexpected. (Antenna House Formatter once had a bug with Latin superscripts in Bulgarian text. Who could have predicted that?) It might be that myriad unit tests could all fail to exercise something that could be caught by checking a value that is calculated within the It might also be that a check within the So there might be a place for both (though still not seeing the need for a lot of extra machinery for programming language-style debugging assertions).
True, although the other approach is to let implementers try things and then standardise what succeeds.
Indeed. |
On Thu, 16 May 2024, 08:56 Andrew Sales, ***@***.***> wrote:
But I still don't understand Andrew's point, sorry, unless he is saying a
developer using this may not cover all cases, or be a matter of discipline:
that's life, isn't it?
I mean that the perceived utility of an assert will run out quickly for
all but the most simple cases and most predictable input.
Sure. Or it may hit someone's sweetspot. If someone does not want to use
them, they don't have to.
In my experience, potentially complicated input require complicated XPaths.
So developers leave out cases they expect will never occur. A way to make
the subset they accept explicit could help maintenance, and prevent the
variable and context XPaths from being obfuscated with terms that are not
expected .
Moreover, the way to uncomplicate XPaths is to use chains of variables: so
only having a @Assert or @post-condition would encourage more chains. For
example, a style guide could require that, unless impossible, all divisions
should be done in a variable so that exceptiins are properly caught and
handled.
A real example from just the other day. I was testing out a new rule which
worked in isolation but threw a divide-by-zero error when incorporated into
the target schema. The cause was my test cases were toys that omitted
otherwise required structures. The IDE was able to take me to the point of
failure in the XSLT for the compiled schema.
Would an assert have helped me? Possibly, but I found the cause from my
IDE anyway. Would I have wanted to put one everywhere in a sizeable schema
where division was used? Probably not. If real-world input had caused this,
I'd've added an extra condition to the relevant XPath and moved on. Here, I
adjusted my test cases and moved on.
The decision to put in redundant checks, like assertions or
post-conditions, would not be based on "would I have wanted to": no-one
ever **wants** to do anything :-) It would be based on risk
considerations: the more that some Schematron schema processes high-value
high-risk information, or requires diagnosis by ops teams apart from an
IDE, the more that adding redundant checks is appropriate.
I had a real example last week too: the DTD allows multiple
/document/fragment/properties-section/properties but none of the documents
have more than one fragments with properties-section with the same name.
But I smelled a rat. I would have liked to have excluded the case of
multiple properties with the same name without palava.
e.g.
<sch:rule context="property"
expect="not(preceding-sibling::property[@name() = current()/@name])" >...
As I mentioned, another way to view this issue is as one of addressibility:
how do we make sure that messages go to the person or
workflow or log that can deal with them. I think adressability (e.g by email
or message handling) is a feature of many pipeline/message systems but
Schematron does not provide an integration point with them.
Static type checking of signatures within Xpaths, like OxygenXml does, is a
different issue, I think. That is where @as <https://github.com/as>
connects better.
Not static: dynamic. Using string() or concat() for example, where an
argument is a sequence because there is unexpectedly more than one of
something that needs reporting in the message generated.
I think you are agreeing with me. Static is a different issue.
To an extent, yes.
It would critically important to be clear how this feature would affect
validity, if it all. I'm not asking for all of this information here and
now, I am just noting the need.
I think I said: not at all. It can be turned on and as deemed necessary
without affecting validity.
At user option, it could be implemented so that exceptions are caught,
logged and the validation proceed.
Rick
|
There may be another for the QLBs here: define what happens when there is
an error. In my view, the best choice would be that a pattern that
generates an error is aborted, by default. But other patterns are not
affected. The SVRL would have some element to signal the pattern crashed.
The definition for simple validity would be "unable to be validated."
Rick
…On Thu, 16 May 2024, 13:54 Rick Jelliffe, ***@***.***> wrote:
On Thu, 16 May 2024, 08:56 Andrew Sales, ***@***.***> wrote:
> But I still don't understand Andrew's point, sorry, unless he is saying a
> developer using this may not cover all cases, or be a matter of discipline:
> that's life, isn't it?
>
> I mean that the perceived utility of an assert will run out quickly for
> all but the most simple cases and most predictable input.
>
Sure. Or it may hit someone's sweetspot. If someone does not want to use
them, they don't have to.
In my experience, potentially complicated input require complicated
XPaths. So developers leave out cases they expect will never occur. A way
to make the subset they accept explicit could help maintenance, and prevent
the variable and context XPaths from being obfuscated with terms that are
not expected .
Moreover, the way to uncomplicate XPaths is to use chains of variables: so
only having a @Assert or @post-condition would encourage more chains. For
example, a style guide could require that, unless impossible, all divisions
should be done in a variable so that exceptiins are properly caught and
handled.
A real example from just the other day. I was testing out a new rule which
> worked in isolation but threw a divide-by-zero error when incorporated into
> the target schema. The cause was my test cases were toys that omitted
> otherwise required structures. The IDE was able to take me to the point of
> failure in the XSLT for the compiled schema.
>
> Would an assert have helped me? Possibly, but I found the cause from my
> IDE anyway. Would I have wanted to put one everywhere in a sizeable schema
> where division was used? Probably not. If real-world input had caused this,
> I'd've added an extra condition to the relevant XPath and moved on. Here, I
> adjusted my test cases and moved on.
>
The decision to put in redundant checks, like assertions or
post-conditions, would not be based on "would I have wanted to": no-one
ever **wants** to do anything :-) It would be based on risk
considerations: the more that some Schematron schema processes high-value
high-risk information, or requires diagnosis by ops teams apart from an
IDE, the more that adding redundant checks is appropriate.
I had a real example last week too: the DTD allows multiple
/document/fragment/properties-section/properties but none of the documents
have more than one fragments with properties-section with the same name.
But I smelled a rat. I would have liked to have excluded the case of
multiple properties with the same name without palava.
e.g.
< sch:rule context="property"
***@***.***
= ***@***.***)">
As I mentioned, another way to view this issue is as one of addressibility
***@***.***, Brutus?): how do we make sure that messages go to the person or
workflow or log that can deal with them. I think adressabilty (e.g by email
or message handling) is a feature of many pipeline/message systems but
Schematron does not provide an integration point with them.
Static type checking of signatures within Xpaths, like OxygenXml does, is
> a different issue, I think. That is where @as <https://github.com/as>
> connects better.
>
> Not static: dynamic. Using string() or concat() for example, where an
> argument is a sequence because there is unexpectedly more than one of
> something that needs reporting in the message generated.
>
I think you are agreeing with me. Static is a different issue.
> To an extent, yes.
>
> It would critically important to be clear how this feature would affect
> validity, if it all. I'm not asking for all of this information here and
> now, I am just noting the need.
>
I think I said: not at all. It can be turned on and as deemed necessary
without affecting validity.
At user option, it could be implemented so that exceptions are caught,
logged and the validation proceed.
Rick
|
On Thu, 16 May 2024, 13:13 Tony Graham, ***@***.***> wrote:
I suggest that using 'assertion' as an unqualified term here risks
confusion with sch:assert (at least for me).
Yes. Maybe I should have left it as "post-condition".
(I'm not sure how well the reasonable return value idiom translates to
Schematron, where there aren't visible return values as such, just the
presence or absence of messages from sch:assert and sch:report.)
But sch:let, sch:rule/@context,sch: param/@value (and sch:value-of, and sch:pattern/@documents) do have
values that coud be tested.
It seems to me that @rjelliffe <https://github.com/rjelliffe> wants the
Schematron to (also) be able to deliver the rude shocks, while
@AndrewSales <https://github.com/AndrewSales> would (mostly) leave it to
the unit tests. Plus, I think there's general agreement that there will
always be one more bug when some user somewhere tries something unexpected.
(Antenna House Formatter once had a bug with Latin superscripts in
Bulgarian text. Who could have predicted that?)
Many users are not comfortable with XPaths. They are not confident that,
for example, when they split a complex Xpath into a chain of variables,
that they know where or if a mistake has occurred. The "if" is particularly
unsettling: has there been no invalidity because an @context never fired?
(Part of this can addressed by counting rule firings in the SVRL, of
course, if that is enabled.)
It might be that myriad unit tests could all fail to exercise something
that could be caught by checking a value that is calculated within the
sch:rule. (At this point I don't know why you would do anything other
than <sch:assert role="debug">, or similar, for it.)
Yes, it may be role=debug is enough, if implementors provide a way to
enable and disable them.
It might also be that a check within the sch:rule never fails anyway, maybe
because the checked condition also fails earlier structural validation so
the Schematron never sees those documents or because there's an error in
the XPaths used in the sch:rule.
So there might be a place for both (though still not seeing the need for a
lot of extra machinery for programming language-style debugging assertions).
I think the @Assert or @post-condition can be quite simple to implement.
I am probably swinging around to allowing sch:expect elements with the same
form as sch:assert, under e.g, let, rule and probably param. And with a @to attribute for addressing.
… Rick
Message ID:
***@***.***
com>
|
Removing the |
(Added: In my Schematron users meeting presentation [Prague 2024] I identified this as proposal as one of the most important IMHO.)
It can be hard, especially for newcomers or rare users, to have confidence that a complex XPath is working the way it should. Indeed, as a matter of good software engineering, the more important ("risky") some code is, the more that you want to have some independent (i.e., redundant) check of it. This is of course well-known since Bertram Meyer, and a rationale for Schematron itself.
So would Schematron be better if it allowed internal assertions on its own Xpaths? I think so, and I think it can be trivially implemented (over XSLT) without neutralizing optimized-lazy evaluation. It would complement e.g. sch:let/@as, which allows a level of typing.
In concrete terms the proposal is that sch:let and sch:rule allow another attribute @post-condition which takes an Xpath expression that evaluates to boolean. The context for this XPath is the variable value or the rule context.
The evaluation of the @post-condition would not go into the SVRL (necessarily): the document's vaidity result is unchanged whether or not these post-conditions are enabled or not. It is intended to for developer information, confidence and debugging not for the end-user of the schema. It would generate implementation-dependent information e.g. on Standard Error output (e.g. xsl:message) or to a log file or for an IDE.
Here are two examples:
This example a rule select all elements that have an @id attribute. However, the developer expects that these all contain non-empty values: the post-condition makes this explicit. We don't want to use sch:assertions for this, because it is a programmer-world thing not a user-world thing.
In this, the document is read in. (And any exceptions are swallowed, or logged.) Then the condition is tested. If there was no document or the wrong one, the post-condition will fail and the failure logged. The implementation can warn the user there has been this problem (e.g. in this pattern) and not produce a result of "valid".
This is a partial fix for the problem that XPath functions can generate exceptions, but Schematron has no mechanism to cope. For example, if trying to parse a number and it is not a number, we put the code into a variable first. The parse fails and generates an exception which is swallowed or fails. Then we check the value using @post-condition so that we are not beholden to the way the engine implements exception handling.
Another example: for helping with complex chains of variables:
which might be implemented as:
(Not debugged. You get the idea. The double handling of var3 is to maintain lazy-evaluation.)
In this case, the developer believes it to be the case that every "thing" has a grandchild element, which simplifies the cases they need to make assertions for. But the developer wants to be able to check this during testing, and not make it something that invades the user's diagnostics. (They could do this using a dedicated phase too, if they wanted full diagnostics, but they might find that bad separation of concerns in their specific scenario.)
Regards
Rick
The text was updated successfully, but these errors were encountered: