-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking for substrings of the node values #75
Comments
(revised)
This gives what I think is the desired behaviour: multiple reports, and with better diagnostics. For the particular use-case that nkutsche provides, I suggest the approach above would be all that is needed: it is not hard to enumerate small numbers of rules, where there are small numbers of candidates. BUT... if it could be shown that there was a use-case for long list or complex data created by functions, then we do have a gap in Schematron. (In this view, the problem is that there is no way to iterate assertions/reports over some element in a variable containing a tree of elements. Or, to iterate over a list of things in an variable. ) One way to approach it would be to adopt something like proposal #63, to allow sch:pattern/@document to reference a document (or sequence of elements) in a variable.
The advantage, perhaps, of this is that it gives the full expressive power of Schematron to the parsed text and it fits in with the notion of a "pattern" better. Another approach would be to do something on sch:rule, such as an attribute @visit-each:
I think it is good for the standard to try to avoid mixing things related to particular QLBs (i.e. XSLT and the limitations of the XDM) and better to put in generic changes that can apply to all QLBs. |
Thanks, @nkutsche and @rjelliffe . FWIW, I like:
I have a dim recollection that we also had something similar in the (proprietary) syntax of XMLProbe. The processing logic would presumably be:
It would seem to make sense to pass |
Ideally, the @subject should be an XPath that corresponds to the subrange matched too. The But lets suppose the @visit-each gets constrained to only include things specified in the QLB. So for xslt2 and 3 it might be the function analyse-string() only. (Are there any other functions in Xpath2 and 3 that return elements?) If this was the constraint, then an implementation could certainly count the characters in the analyse-string-result/*/text() to find the index into the original text() or whatever and put out an Xpath with substring() as the @subject. |
Is it that the general case here is what we used to call "embedded notations"? I.e. where there is some language embedded. For example, CSS, JavaScript and so on. JSON embedded in XML. Or SQL etc. ... or XPath. For example, imagine a Schematron schema for XSLT that allowed us to apply house-style rules to the Xpaths?
This is, of course, one of the way that syntax checkers like Java FindBugs or PMD work. They create an XML parse tree, then use some XPath system to express rules. |
+1 for @visit-each. I would not restrict the value to a particular function but a) its specification part of the query language bindinung and b) define the respective XPath version for XSLT & XPath. Running the assertion over a transformed/parsed context would also allow the validation of structured data in attributes ("embedded notations") easier to express. |
And this might strengthen the case for sch:rule-set: so you can make
multiple assertions on the same synthesized data.
…On Fri, Jun 14, 2024 at 1:44 AM David Maus ***@***.***> wrote:
Thanks, @nkutsche <https://github.com/nkutsche> and @rjelliffe
<https://github.com/rjelliffe> . FWIW, I like:
<sch:rule context="xxx/text()" visit-each="analyze-string(.,
'foo')/analyze-string-view/match">
I have a dim recollection that we also had something similar in the
(proprietary) syntax of XMLProbe.
The processing logic would presumably be:
* ***@***.***` evaluates to a sequence of any length against the ***@***.***`;
* if the result is the empty sequence, the rule still fires;
* the ***@***.***`s are then evaluated against every item in the sequence.
It would seem to make sense to pass @visit-each through to the SVRL
output too.
+1 for @visit-each.
I would not restrict the value to a particular function but a) its
specification part of the query language bindinung and b) define the
respective XPath version for XSLT & XPath.
Running the assertion over a transformed/parsed context would also allow
the validation of structured data in attributes ("embedded notations")
easier to express.
—
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKPZJJFWSVAXJD2D553ZHG44ZAVCNFSM6AAAAABJF5HP2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRWGA2DOMZSGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
This discussion goes in a direction I did not expected. And I'm not sure if I understand everything. But my impression is, that your focus is on my first issue:
At least as important is the second issue:
Let me demonstrate this on an example. I implemented @rjelliffe's approach for a regex match with pure Schematron. That's how it looks in Oxygen UI: Now compare it with the same regex match just with the Escali extension: What do you think how much time you can save without searching for the exactly location of the bad terms in just that small paragraphs? And now assume you have much longer documents with a lot of checks for much more hidden patterns (like abbreviations)! If I understand @AndrewSales correctly, |
(Edited)
What do you think how much time you can save without searching for the
exactly location of the bad terms in just that small paragraphs? And now
assume you have much longer documents with a lot of checks for much more
hidden patterns (like abbreviations)!
If I understand @AndrewSales <https://github.com/AndrewSales> correctly,
@ visit-each would allow a free XPath expression. But that would not make it
possible for an UI to provide the exact position of the matched patterns,
would it?
If I can generalize Nico's issue, the question for the @visit-each
approach is how each item on the transformed data carries information
relating the original context with it, suitable for SRVL (or other outputs)
to exactly pinpoint the location. It would be a backwards step for
Schematron to lose track of the positions in the XML, after all. But XDM
does not support "slices" (substrings that are indexes into an existing
string.)
I think the answer is that the QLB (e.g., for XSLT 3) specifies the
standard transformation functions it supports, and these must have XML
output with some index information. * So the implementation parses the
@view-as value for QLB specified keywords such as 'analyse-string' and
tweaks their output accordingly.
In the case of analyse-string(), this would involve each
analyse-string-result/* element being decorated with an attribute
@sch:start. This would contain string-length(string-join(preceding-sibling::*)).
What about functions that don't return XML elements? In the case of, say,
tokenize() which goes from text() to xs:string*, the implementation would
wrap each token with some element tag that includes the start index. e.g
````
<sch:rule context="fred" view-as="tokenize( '(a*b*c*)* )">
````
where element fred has value 'abcabacb' would iterate over say
````
<string sch:start="1">abc</token>
<string sch:start="4">ab</token>
<string sch:start="6">ac</token>
<string sch:start="8">b</token>
````
So I expect that that @view-as needs to be split to expose the XML to be
tweaked E.g:
````
<sch:rule context="dog" view-as="licence/analyse-string( ...) "
each="/analyse-string-result/matched" />
or
<sch:rule context="dog">
<sch:view-as with="licence/analyse-string( ...) "
select="/analyse-string-result/matched" />
````
So how would you then transfer the index to the outside SVRL etc? You would
use properties, which are the generic way to decorate the SVRL (or
whatever) with dynamic information intended for use by subsequent
processes rather than humans directly. :
````
<sch:report text="contains(., 'kitty')"
properties="string-position"
A dog licence should not have the string 'kitty'</report>
...
<sch:property name="string-position">
<string-start><sch:value select=" @ sch:start"/></string-start>
<string-end><sch:value select=" @ sch;start + string-length(.)"></string-end>
</sch:property>
````
In other words, to register a function for @view-as the QLB needs to
specify;
- its name that will be recognized
- how to tweak the functions output so that is XML with appropriate
index markup (e.g., sch:string)
- plus probably some standardized property for supporting slices, to
transfer the range indexes into the SVRL.
(I am not sure, but did I hear that XPath 4 might include slices or text
ranges? @view-as might find that useful.)
Rick
* @andrew: presumably, ISO Schematron 2005 standard would have a new annex
specifying how some standard set of XSLT functions that go from text() to
element()* are handled and decorated. The each QLB that was interested
would reference that. I think the QLB mechamism is appropriate, as it is
intended to ensure that an implementation has everything needed to
interpret the schema.
…On Fri, Jun 14, 2024 at 6:15 PM Nico Kutscherauer ***@***.***> wrote:
This discussion goes in a direction I did not expected. And I'm not sure
if I understand everything.
But my impression is, that your focus is on my first issue:
This has the following problems:
- If you have multiple occurrences of "foo" in a text node you get
always only one validation error for this node.
At least as important is the second issue:
- If you have a long text node it may very hard to find the exact
phrase which causes the error, especially if the searched phrase is a
single sign and/or something well hidden.
Let me demonstrate this on an example.
I implemented @rjelliffe <https://github.com/rjelliffe>'s approach for a
regex match with pure Schematron. That's how it looks in Oxygen UI:
grafik.png (view on web)
<https://github.com/Schematron/schematron-enhancement-proposals/assets/9881428/7d622659-83b1-4f1a-975f-8fcc7b9ca1b3>
Now compare it with the same regex match just with the Escali extension:
grafik.png (view on web)
<https://github.com/Schematron/schematron-enhancement-proposals/assets/9881428/8fa045c6-7bed-4d39-837e-24931eb1738a>
What do you think how much time you can save without searching for the
exactly location of the bad terms in just that small paragraphs? And now
assume you have much longer documents with a lot of checks for much more
hidden patterns (like abbreviations)!
If I understand @AndrewSales <https://github.com/AndrewSales> correctly,
@visit-each would allow a free XPath expression. But that would not make
it possible for an UI to provide the exact position of the matched
patterns, would it?
—
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKITLL3E6EIP7Y5H35LZHKRDZAVCNFSM6AAAAABJF5HP2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRXGQ4TSNZQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Nico wrote:
This is not a fair comparison. If you wrote a custom extension for the regex approach you could get specific underlines too. |
I have updated previous answer in the wiki. I meant to use
string-length() not count() to get the start position in the text.
Rick
On Sat, Jun 15, 2024 at 1:30 AM Rick Jelliffe ***@***.***>
wrote:
… > What do you think how much time you can save without searching for the
exactly location of the bad terms in just that small paragraphs? And now
assume you have much longer documents with a lot of checks for much more
hidden patterns (like abbreviations)!
> If I understand @AndrewSales <https://github.com/AndrewSales>
correctly, @visit-each would allow a free XPath expression. But that
would not make it possible for an UI to provide the exact position of the
matched patterns, would it?
If I can generalize Nico's issue, the question for the @visit-each
approach is how each item on the transformed data carries information
relating the original context with it, suitable for SRVL (or other outputs)
to exactly pinpoint the location. It would be a backwards step for
Schematron to lose track of the positions in the XML, after all. But XDM
does not support "slices" (substrings that are indexes into an existing
string.)
I think the answer is that the QLB (e.g., for XSLT 3) specifies the
standard transformation functions it supports, and these must have XML
output with some index information. * So the implementation parses the
@view-as value for QLB specified keywords such as 'analyse-string' and
tweaks their output accordingly.
In the case of analyse-string(), this would involve each
analyse-string-result/* element being decorated with an attribute
@sch:start. This would contain count(string-join(preceding-sibling::*)).
What about functions that don't return XML elements? In the case of, say,
tokenize() which goes from text() to xs:string*, the implementation would
wrap each token with some element tag that includes the start index. e.g
````
<sch:rule context="fred" view-as="tokenize( '(a*b*c*)* )">
````
where element fred has value 'abcabacb' would iterate over say
````
<string sch:start="1">abc</token>
<string sch:start="4">ab</token>
<string sch:start="6">ac</token>
<string sch:start="8">b</token>
````
So I expect that that @view-as needs to be split to expose the XML to be
tweaked E.g:
````
<sch:rule context="dog" view-as="licence/analyse-string( ...) "
each="/analyse-string-result/matched" />
or
<sch:rule context="dog">
<sch:view-as with="licence/analyse-string( ...) "
select="/analyse-string-result/matched" />
````
So how would you then transfer the index to the outside SVRL etc? You
would use properties, which are the generic way to decorate the SVRL (or
whatever) with dynamic information intended for use by subsequent
processes rather than humans directly. :
````
<sch:report text="contains(., 'kitty')"
properties="string-position"
>A dog licence should not have the string 'kitty'</report>
...
<sch:property name="string-position">
<string-start><sch:value ***@***.***'/></string-start>
<string-end><sch:value ***@***.*** +
length(.)"></string-end>
</sch:property>
````
In other words, to register a function for @view-as the QLB needs to
specify;
- its name that will be recognized
- how to tweak the functions output so that is XML with appropriate
index markup (e.g., sch:string)
- plus probably some standardized property for supporting slices, to
transfer the range indexes into the SVRL.
(I am not sure, but did I hear that XPath 4 might include slices or text
ranges? @view-as might find that useful.)
Rick
* @andrew: presumably, ISO Schematron 2005 standard would have a new
annex specifying how some standard set of XSLT functions that go from
text() to element()* are handled and decorated. The each QLB that was
interested would reference that. I think the QLB mechamism is appropriate,
as it is intended to ensure that an implementation has everything needed to
interpret the schema.
On Fri, Jun 14, 2024 at 6:15 PM Nico Kutscherauer <
***@***.***> wrote:
> This discussion goes in a direction I did not expected. And I'm not sure
> if I understand everything.
>
> But my impression is, that your focus is on my first issue:
>
> This has the following problems:
>
> - If you have multiple occurrences of "foo" in a text node you get
> always only one validation error for this node.
>
> At least as important is the second issue:
>
>
> - If you have a long text node it may very hard to find the exact
> phrase which causes the error, especially if the searched phrase is a
> single sign and/or something well hidden.
>
> Let me demonstrate this on an example.
>
> I implemented @rjelliffe <https://github.com/rjelliffe>'s approach for a
> regex match with pure Schematron. That's how it looks in Oxygen UI:
>
> grafik.png (view on web)
> <https://github.com/Schematron/schematron-enhancement-proposals/assets/9881428/7d622659-83b1-4f1a-975f-8fcc7b9ca1b3>
>
> Now compare it with the same regex match just with the Escali extension:
>
> grafik.png (view on web)
> <https://github.com/Schematron/schematron-enhancement-proposals/assets/9881428/8fa045c6-7bed-4d39-837e-24931eb1738a>
>
> What do you think how much time you can save without searching for the
> exactly location of the bad terms in just that small paragraphs? And now
> assume you have much longer documents with a lot of checks for much more
> hidden patterns (like abbreviations)!
>
> If I understand @AndrewSales <https://github.com/AndrewSales> correctly,
> @visit-each would allow a free XPath expression. But that would not make
> it possible for an UI to provide the exact position of the matched
> patterns, would it?
>
> —
> Reply to this email directly, view it on GitHub
> <#75 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AF65KKITLL3E6EIP7Y5H35LZHKRDZAVCNFSM6AAAAABJF5HP2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRXGQ4TSNZQGI>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***
> com>
>
|
I have re-worked my view-as proposal into #76. This expresses it with more issues addressed, and I didn't want to hijack this thread, so that Nico's proposal can be discussed on its own merits. |
This thread has developed a lot since I last commented. @nkutsche , I understand the issue and what you want to achieve. I favour the additional attribute The specific, and likely most common, use case of matching substrings hinges on reporting the locations of those substrings. XPointer offered something here, but I'm not sure there was much uptake. My feeling is the standard should incorporate the new attribute, make its intended purpose clear, and suggest (non-normatively) a possible method of passing sufficient substring information through to SVRL. For example, where It might be clearest to write such rules as:
so that an implementation can report the result of I don't think the standard should attempt to define methods of capturing, expressing, obtaining or reporting substrings beyond this, as in my view that would tend towards a Schematron-specific subsetting of XPath and the XDM. Implementations are of course free to do what they need to internally. (@nkutsche - just a thought: what if an SQF micro-transform added the match markup to the string? Then your headline report tells you "trouble in this para", when you get there, user has SQF mark up the problems, preferably in an |
Wouldn't |
If the |
If the minimal approach is desired, I can see five approaches. I like 1 and
, and 2 is OK.
*1. Different Context*
```
<sch:rule context="fred" viewAs="my:tokenize( '(a*b*c*)* )">
<sch:let name="freds-parent" value="./parent::*" as="element()" />
<sch:assert test="$freds-parent/name()='Mavis' and contains(.,
'c')">
Fred's parent must be Mavis, and each abc token must contain at
least one 'c'.
</sch:assert>
</sch:rule>
```
I.e. the context for a variable is the ***@***.***, but the context
for an assertion (sch:assert or sch:report) is the token. (I.e. if a
report or assert needs to get access to the ***@***.***, it needs to do
it through a variable.)
*Example: CSV*
Lets validate embedded CSV cells. Lets say the user defines a function
my:parseCsv() that parses newlines and "," into rows and cells. And lets
say that ***@***.*** allows @select to select which data to iterate
over.
So our document is:
```
<data>abc|123|456
def|123|456</data>
```
Which might parse into:
```
<csv>
<row n=1>
<c n=1>abc</c>
<c n=2>123</c>
<c n=3>456</c>
</row>
<row n=2>
<c n=1>def</c>
<c n=2>123</c>
<c n=3>456</c>
</row>
</csv>
```
Which can be validated by:
```
<sch:rule context="csv" viewAs="my:parseCsv(.)" select="/csv/row" >
<sch:assert test="self::row" role="unit-test">Only validates
rows</sch:assert>
<sch:assert test="number(c[2]) and c[2]!=0">The second cell must be a
number</sch:assert>
<sch:rule>
```
*2. Same Context*
Another approach would be
```
<sch:rule context="fred" viewAs="my:tokenize( '(a*b*c*)* )">
<sch:assert test="$sch:rule-context/parent::*/name()='Mavis' and
contains(., 'c')">
Fred's parent must be Mavis, and each abc token must contain at
least one 'c'.
</sch:assert>
</sch:rule>
```
In this approach, the original rule-context is provided by a new built-in
variable $sch:rule-context. Variables and asserts would operate in the
context of the token. So this would be the same as the previous:
```
<sch:rule context="fred" viewAs="my:tokenize( '(a*b*c*)* )">
<sch:let name="freds-parent" value="$sch:rule-context/parent::*"
as="element()" />
<sch:assert test="$freds-parent/name()='Mavis' and contains(.,
'c')">
Fred's parent must be Mavis, and each abc token must contain at
least one 'c'.
</sch:assert>
</sch:rule>
```
When defining this, I suppose that the QLBs should state
- xslt1 and xpath1 : ***@***.*** not available (as they have no
regex capabilities)
- or, provide some simple tokenizing mechanism such as a just
tokenising on whitespace+"|"
- xslt2 and xpath2: regex with no capture group: tokenize()
- xslt3 and xpath3: tokenize() and regex with capture group
analyze-string()
But I don't have a strong opinion either way. And I still suspect that the
full sch:viewAs element offers more flexibility (e.g. with directory
listings, custom functions, or having to interact with e.g. Java extension
functions.)
*3. Allow ***@***.*** to specify functions*
I don't like this one much.
Another approach might be to allow ***@***.*** to specify compound
paths with functions rather than just being simple location steps.
````
<sch:rule context="text()/analyze-string(.,
'[^\s]+')/analyze-string-result/match">
<sch:report test=".='foo'">Oh dear I found a foo</sch:report>
</sch:rule>
````
The XPath3 analyze-string function returns an element.
In this case, the compiler sees that there is an "/analyze-string(." text,
and splits it to do the iteration.
This has the advantage of not requiring extra schematron elements or
concepts. But it might stuff things up if anyone wanted to use
analyze-string inside a predicate in the @context location step (which
might not be a common thing.) So it seems too messy.
*4. ***@***.****
I don't like this one much.
Another approach might be to use the existing ***@***.***
````
<sch:schema ...>
...
<sch:let name="analyzed-text"
value="analyze-string(string-join(//text(), ' '), '[^\s]+)"
as="element()" />
<sch:pattern documents="$analyzed-text">
<sch:rule name="match">
<sch:report test=".='foo'>Found a foo!</report>
<sch:rule>
<sch:pattern>
````
So this does not require any change to Schematron 2016 (IIRC) syntax.
Analyze-string() returns an element not a document, so the compiler or
engine would have to convert the element to a document. The downside is
that this only allows a very coarse-grain approach. And the item cannot be
reported in terms of its original context, which I think is not
satisfactory.
*5. Extend ***@***.*** to allow variable*
In this approach we say that rulesets can define their own data source for
the rule. So
````
<sch:ruleset name="text-no-noes">
<sch:let name="tokenized-text"
value="for $t in //*[text()] return
analyze-string(., '[^\s]+)"
as="element()*"/>
<sch:rule context="$tokenized-text/analyze-string-result/match">
<sch:report test=".='foo' ">Found a foo </sch:report>
...
````
This works because there is no need to restrict
***@***.*** to a simple location path (i.e. what can
appear in ***@***.***).
So ***@***.*** can continue the way it is: validating
the provided document. But rulesets would have the added capability of
being able to operate on views. (I guess the implementation would just
check if the first character of the @context was "$" if it needed to
generate different code for a view.)
Rick
…On Mon, Sep 30, 2024 at 10:40 PM David Maus ***@***.***> wrote:
If the @visit-each sets the context for the assertion tests does it also
set the context for sch:let variables?
—
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKJKRZK4RC4RG6YN6JDZZFBCXAVCNFSM6AAAAABJF5HP2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBTGA3TMMZZGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Same here. |
More questions:
|
I suspect that, if we don't want to compromise diagnostics and assertions
(which I think are a core value-proposition for Scematron, yes?), then
- If we went ahead with 1) so that (except for the assertion tests) the
context is always the ***@***.*** (for let, name, value, etc) then
we need some variable for the view elements: e.g. $sch:view-context and
$sch:view-root. OR
- If we went ahead with 2) so that the context is always the
***@***.*** iteration (for assertion tests, let, name, value, etc)
then we need some variable for the original elements: e.g.
$sch:current-rule-context.
So the "minimal" proposal would be:
- ***@***.*** bot (however spelled) which is an xpath function
call to produce any kind of XPath data structure: lists of strings as well
as nodes
- ***@***.*** which is an Xpath location path on the result
of @view-as which provides the iterator
- variables to allow full access in diagnostics:
$sch:current-rule-context, $sch:view-root, $sch:current-view-context
These would be needed for 1) or 2), whichever was decided.
Rick
…On Tue, 1 Oct 2024, 5:40 pm David Maus, ***@***.***> wrote:
More questions:
- does @visit-each set the context for dynamic expressions in
properties and diagnostics?
- does @visit-each set the context for the name query?
- does @visit-each set the context for the value-of query?
—
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKMQAGLS62ZQNPEZE5TZZJGVZAVCNFSM6AAAAABJF5HP2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBVGAZDEMJXHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Well, there is
Good question. This isn't (yet) clarified in the text of the standard. @rjelliffe the issue with your submissions via email persists whereby markup(?) is substituted by |
Sorry about the formatting. I hope this is good enough:
My point was that in order to get good assert text/diagnostics/properties
we need to have access to both the outer context of the sch: rule / @
context and the inner context of the string being visited.
So it would be prudent to provide clear variables to making these
available:
- $ sch: current-rule-context
- the node selected by sch: rule / @ context
- this is processed into substrings or slices in some way, then
iterated over to make a sequence of views
- this is what a sch: rule [ not ( @ viewEach )]/ sch: assert/ @
test runs on
- $ sch: current-view
- the substring? that the current iterated view
- this is what an sch: rule [ @ viewEach ]/ sch: assert/ @ test
runs on
- $ sch: current-view-index
- the index of the view (i.e. the iterator's index not the character
index)
So the attributes of sch: rule would be something like
- @ as = typing for what is selected by @ context
- @ viewAs = typing for what is selected by @ viewEach (e.g. for a
string parsed into numbers)
- @ viewEach = Xpath function to divide string or whatever, then iterate
Regards
Rick
…On Mon, Oct 28, 2024 at 4:07 AM Andrew Sales ***@***.***> wrote:
@dmj <https://github.com/dmj>
Wouldn't visitEach a better name? As far as I can see all other ISO
Schematron attributes use camel case.
Well, there is is-a too, but I take your point.
If the @visit-each sets the context for the assertion tests does it also
set the context for sch:let variables?
Good question. This isn't (yet) clarified in the text of the standard.
@rjelliffe <https://github.com/rjelliffe> the issue with your submissions
via email persists whereby markup(?) is substituted by ***@***.***,
making your examples unintelligible.
—
Reply to this email directly, view it on GitHub
<#75 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF65KKOCNZXV4EUTZEYKNSTZ5UFUXAVCNFSM6AAAAABJF5HP2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBQGEYDCNZYGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
I developed years ago an extension of Schematron for my Escali to enhance the possibilities of checking sub-phrases inside of text nodes. During XMLPrague 2024 this topic came up again in discussions and I would like to follow up this discussion here:
Currently you can only check nodes in Schematron. If you want to avoid for instance the phrase
"foo"
in any text node you would do the following:This has the following problems:
"foo"
in a text node you get always only one validation error for this node.The basic idea of the extension was to provide a regular expression additional to the matched context. I would suggest something like this (reworking my initial proposal):
The following rules should be applied:
@regex
@flags
are used for the regex analyzing as known in several XPath functions (e.g.match#3
)sch:report-phrase/@test
attribute can be added optionallyxsl:analyze-string
)true
the matched substring will produce a validation errortrue()
sch:report-phrase
is also the matched substringI don't think it is hard to implement, as the behavior is completly covered by
xsl:analyze-string
. The highest challenge is to specify the location of the substring and integrate it into SVRL, isn't it? I did it in my implementation by adding character positions (start and length) to the location node path.What do you think?
The text was updated successfully, but these errors were encountered: