Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL String. Unicode escapes exclude surrogates. #190

Merged
merged 1 commit into from
Feb 6, 2025
Merged

SPARQL String. Unicode escapes exclude surrogates. #190

merged 1 commit into from
Feb 6, 2025

Conversation

afs
Copy link
Contributor

@afs afs commented Jan 29, 2025

This closes #188
This closes #189

  • Call "SPARQL Request String", "SPARQL string" (original anchor still works)
    "Request" is too oriented to protocol action
  • SPARQL string is a RDF string that conforms to the grammar section
  • More reSpec on definitions
  • Exclude surrogates from unicode escape sequences (clarity - they are not allowed in RDF string)

Used more reSpec


Preview | Diff

@@ -10551,15 +10570,19 @@ <h3>Codepoint Escape Sequences</h3>
<a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
</td>
<td>A Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the
encoded hexadecimal value.</td>
encoded hexadecimal value, excluding U+D800 to U+DFFF, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't make this restriction in the Turtle (or related) grammars. It's arguably not necessary, as the value space already restricts RDF/SPARQL strings from including bare surrogates. There should probably be some tests that attempt to create strings using such escape sequences and result in syntax errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syn-invalid-codepoint-escaped-bad-01.rq

w3c/rdf-tests#167

Copy link
Contributor Author

@afs afs Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input is an RDF string but the output isn't.

By adding the restriction to unicode escapes, the outcome is an RDF string and nothing more needs to be said. Saying it at the point where bad things happen ™️ is IMO clearer. Could be a note.

Otherwise, somewhere should say that EBNF parsing is on an RDF string again, which is really just moving the unicode escape text about.

Slightly different issue in Turtle because of the different way Unicode escapes are handled which is during parsing. But the text there says the outcome is in the range U+0000 to U+FFFF which includes surrogates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for missing this earlier, this prevents code points that can be encoded as paired surrogates to be encoded as as two consecutive escape sequences (one for the high surrogate and one for the low one). I am not sure we allowed that explicitely before so I am not sure it's a big deal. See w3c/rdf-turtle#84

@@ -5364,7 +5364,7 @@ <h4>Operator Extensibility</h4>
</section>
<section id="SparqlOps">
<h3>Function Definitions</h3>
<p>This section defines the operators and functions introduced by the SPARQL Query language.
<p>This section defines the operators and functions introduced by the SPARQL query language.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the variance in capitalization. SPARQL Query language here (line 5367) is being changed to SPARQL query language, while earlier in the document (line 393), SPARQL query language is being changed to SPARQL Query Language. Note that neither line contains (part of) a title; they're both body prose.

There are other variances elsewhere in the document. Why do these vary? What is being communicated by the difference in casing, besides confusion?

@afs afs merged commit 610778f into main Feb 6, 2025
2 checks passed
@afs afs deleted the strings branch February 6, 2025 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode escapes should be restricted to exclude surrogates. Make sure SPARQL uses "RDF string"
7 participants