Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed strings with replacement characters (�) #40

Open
mdeagen opened this issue May 12, 2021 · 0 comments
Open

Malformed strings with replacement characters (�) #40

mdeagen opened this issue May 12, 2021 · 0 comments

Comments

@mdeagen
Copy link

mdeagen commented May 12, 2021

Somewhere in data ETL to the RDF triple store, special characters are replaced with the replacement character, �.

The issue also occurs when saving a Vega-Lite chart spec containing any Unicode escape string (e.g., \u00b0 for ° symbol).

Below is a SPARQL query that returns many (~600) of these malformed strings, which include article titles, author names, and definitions of terms:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX sio: <http://semanticscience.org/resource/>
SELECT * WHERE {
  ?sub ?pred ?MalformedString
  FILTER(REGEX(STR(?MalformedString), "�"))
}
VALUES ?pred { dct:title foaf:name rdfs:label skos:definition dcat:keyword dct:description skos:altLabel skos:notation }
# NOTE: sio:hasValue also has ~1500 malformed strings as objects, but query time gets increased by ~120 seconds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant