-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert formats to feed alternative usage #4
Comments
@pamputt, which file format do you prefer to replace your listing lingualibre languages (lili qid) to feed the bot ? Json, csv, tsv ? I would prefer to save both the Qid and the number pf recordings, so all those languages over 50k records get divided sparql queries. |
The less complicated the better, so TSV if the best (or CSV), but definitely not JSON. |
EDIT: Outch. Default Blazegraph API as used by Lingualibre.org endpoint only support xml, json. So best is to return json and use JQ to format this. (I'm on it) JSON via sparql2datasparql2data has built-in data validation, only saves response if valid. bash script.sh -q ./path/to/LL-LanguagesRecordsData.sparql -s lingualibre -f json
# Output response in ./data/LL-LanguagesRecordsData.json JSON via Lingualibre API direct callWe can borrow the core code from sparql2data, to integrate it into the Lingua-libre-bot's code : # Sparql query
query=$(cat ${sparql})
# echo "QUERY= ${query}" | head -n 5
# CURL SPARQL query on Wikidata
response=$(curl -G --data-urlencode query="${query}" https://lingualibre.org/sparql?format=json)
echo "RESPONSE: ${response}" | head -n 20
# First cleanup
clean=$(echo "${response}" | jq '.results.bindings' | jq 'map(map_values(.value))' | sed -E "s/https?:\/\/.*\/entity\///g" )
## IF is valid response, THEN print to local file, ELSE error message.
firstline=$(echo "${clean}" | head -n 1)
if [[ ${firstline:0:1} == "[" ]]; then
echo "${clean}" > "./data/list_languages.json";
else
echo "XHR response appears invalid, was NOT printed to "./data/list_languages.json"
fi JSON via Github Sparql2JsonSparql2Data is also configured as a github page, with nightly builds, which can be queried as an API.
TSV from JSON via JQJQ is a well known package to process and reformat JSON data, i.e. : curl -G https://hugolpz.github.io/Sparql2Data/data/LL-LanguagesActive.json | \
jq --raw-output '.[] | .language+" "+.records+" "+.languageLabel' Or curl -G https://hugolpz.github.io/Sparql2Data/data/LL-LanguagesActive.json | \
jq --raw-output '.[] | [.language, .records, .languageLabel] | @tsv' Output:
LoopThen you will have to load ( #!/bin/bash
# USAGE: bash loop.sh file.tsv
filepath="$1"
while IFS=$'\t' read -r llqid records; do
# Run a command with the two columns as parameters
echo "Running command with parameters: $llqid $records"
if [[ $records >= 50000 ]]; then
# yearly python run $llqid
else
# minimal python run $llqid
fi
done < "$filepath" |
@pamputt , I see the way head. |
SPARQL2JSON:
JSON:
To be reused in:
Processing
Example :
CSV
For csv: convert json to csv or download csv directly.
The text was updated successfully, but these errors were encountered: