SPARQL String. Unicode escapes exclude surrogates. #190

afs · 2025-01-29T17:43:06Z

This closes #188
This closes #189

Call "SPARQL Request String", "SPARQL string" (original anchor still works)
"Request" is too oriented to protocol action
SPARQL string is a RDF string that conforms to the grammar section
More reSpec on definitions
Exclude surrogates from unicode escape sequences (clarity - they are not allowed in RDF string)

Used more reSpec

gkellogg · 2025-01-29T18:08:05Z

spec/index.html

@@ -10551,15 +10570,19 @@ <h3>Codepoint Escape Sequences</h3>
                <a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
              </td>
              <td>A Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the
-                encoded hexadecimal value.</td>
+                encoded hexadecimal value, excluding U+D800 to U+DFFF, the 


We don't make this restriction in the Turtle (or related) grammars. It's arguably not necessary, as the value space already restricts RDF/SPARQL strings from including bare surrogates. There should probably be some tests that attempt to create strings using such escape sequences and result in syntax errors.

syn-invalid-codepoint-escaped-bad-01.rq

w3c/rdf-tests#167

The input is an RDF string but the output isn't.

By adding the restriction to unicode escapes, the outcome is an RDF string and nothing more needs to be said. Saying it at the point where bad things happen ™️ is IMO clearer. Could be a note.

Otherwise, somewhere should say that EBNF parsing is on an RDF string again, which is really just moving the unicode escape text about.

Slightly different issue in Turtle because of the different way Unicode escapes are handled which is during parsing. But the text there says the outcome is in the range U+0000 to U+FFFF which includes surrogates.

sorry for missing this earlier, this prevents code points that can be encoded as paired surrogates to be encoded as as two consecutive escape sequences (one for the high surrogate and one for the low one). I am not sure we allowed that explicitely before so I am not sure it's a big deal. See w3c/rdf-turtle#84

spec/index.html

TallTed · 2025-01-30T21:43:29Z

spec/index.html

@@ -5364,7 +5364,7 @@ <h4>Operator Extensibility</h4>
      </section>
      <section id="SparqlOps">
        <h3>Function Definitions</h3>
-        <p>This section defines the operators and functions introduced by the SPARQL Query language.
+        <p>This section defines the operators and functions introduced by the SPARQL query language.


I do not understand the variance in capitalization. SPARQL Query language here (line 5367) is being changed to SPARQL query language, while earlier in the document (line 393), SPARQL query language is being changed to SPARQL Query Language. Note that neither line contains (part of) a title; they're both body prose.

There are other variances elsewhere in the document. Why do these vary? What is being communicated by the difference in casing, besides confusion?

spec/index.html

afs mentioned this pull request Jan 29, 2025

Update links in SPARQL update conformance for syntax w3c/sparql-update#52

Merged

afs force-pushed the strings branch 2 times, most recently from 9b68e44 to cda46be Compare January 29, 2025 18:01

gkellogg reviewed Jan 29, 2025

View reviewed changes

afs force-pushed the strings branch from cda46be to 0601e89 Compare January 29, 2025 18:45

Tpt approved these changes Jan 29, 2025

View reviewed changes

hartig approved these changes Jan 29, 2025

View reviewed changes

TallTed reviewed Jan 29, 2025

View reviewed changes

spec/index.html Show resolved Hide resolved

rubensworks approved these changes Jan 30, 2025

View reviewed changes

afs force-pushed the strings branch from 0601e89 to 18a4c82 Compare January 30, 2025 08:41

SPARQL String. Unicode escapes exclude surrogates

4038a3b

afs force-pushed the strings branch from 18a4c82 to 4038a3b Compare January 30, 2025 08:50

TallTed reviewed Jan 30, 2025

View reviewed changes

spec/index.html Show resolved Hide resolved

kasei approved these changes Jan 31, 2025

View reviewed changes

afs merged commit 610778f into main Feb 6, 2025
2 checks passed

afs deleted the strings branch February 6, 2025 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARQL String. Unicode escapes exclude surrogates. #190

SPARQL String. Unicode escapes exclude surrogates. #190

afs commented Jan 29, 2025 •

edited by pr-preview bot

Loading

gkellogg Jan 29, 2025

afs Jan 29, 2025

afs Jan 29, 2025 •

edited

Loading

Tpt Feb 8, 2025

TallTed Jan 30, 2025

SPARQL String. Unicode escapes exclude surrogates. #190

SPARQL String. Unicode escapes exclude surrogates. #190

Conversation

afs commented Jan 29, 2025 • edited by pr-preview bot Loading

gkellogg Jan 29, 2025

Choose a reason for hiding this comment

afs Jan 29, 2025

Choose a reason for hiding this comment

afs Jan 29, 2025 • edited Loading

Choose a reason for hiding this comment

Tpt Feb 8, 2025

Choose a reason for hiding this comment

TallTed Jan 30, 2025

Choose a reason for hiding this comment

afs commented Jan 29, 2025 •

edited by pr-preview bot

Loading

afs Jan 29, 2025 •

edited

Loading