Convert formats to feed alternative usage #4

hugolpz · 2023-02-06T21:47:47Z

SPARQL2JSON:

script.sh

JSON:

To be reused in:

Processing

Example :

# For Lingua Libre Bot
cat LL-LanguagesRecordsData.json | jq --raw-output '.[] | [.language, .records, .languageLabel] | @tsv'
# For Operations
cat LL-LanguagesActive.json | jq --raw-output '.[] | [.language, .records, .languageLabel] | @tsv'

CSV

For csv: convert json to csv or download csv directly.

hugolpz · 2023-02-06T22:21:13Z

@pamputt, which file format do you prefer to replace your listing lingualibre languages (lili qid) to feed the bot ? Json, csv, tsv ?

I would prefer to save both the Qid and the number pf recordings, so all those languages over 50k records get divided sparql queries.

pamputt · 2023-02-07T07:35:58Z

The less complicated the better, so TSV if the best (or CSV), but definitely not JSON.

hugolpz · 2023-02-07T09:18:02Z

EDIT: Outch. Default Blazegraph API as used by Lingualibre.org endpoint only support xml, json. So best is to return json and use JQ to format this. (I'm on it)

JSON via sparql2data

sparql2data has built-in data validation, only saves response if valid.
Then, given query LL-LanguagesRecordsData.sparql, one can use Sparql2data as a module with command such as:

bash script.sh -q ./path/to/LL-LanguagesRecordsData.sparql -s lingualibre -f json
# Output response in ./data/LL-LanguagesRecordsData.json

JSON via Lingualibre API direct call

We can borrow the core code from sparql2data, to integrate it into the Lingua-libre-bot's code :

# Sparql query
query=$(cat ${sparql})
# echo "QUERY= ${query}" | head -n 5

# CURL SPARQL query on Wikidata
response=$(curl -G --data-urlencode query="${query}" https://lingualibre.org/sparql?format=json)
echo "RESPONSE: ${response}" | head -n 20

# First cleanup
clean=$(echo "${response}" | jq '.results.bindings' | jq 'map(map_values(.value))' | sed -E "s/https?:\/\/.*\/entity\///g" )

## IF is valid response, THEN print to local file, ELSE error message.
firstline=$(echo "${clean}" | head -n 1)
if [[ ${firstline:0:1} == "[" ]]; then
    echo "${clean}" > "./data/list_languages.json"; 
else
    echo "XHR response appears invalid, was NOT printed to  "./data/list_languages.json"
fi

JSON via Github Sparql2Json

Sparql2Data is also configured as a github page, with nightly builds, which can be queried as an API.

response=$(curl -G https://hugolpz.github.io/Sparql2Data/data/LL-LanguagesActive.json)
echo ${response}

TSV from JSON via JQ

JQ is a well known package to process and reformat JSON data, i.e. :

curl -G https://hugolpz.github.io/Sparql2Data/data/LL-LanguagesActive.json | \
    jq --raw-output '.[] | .language+"   "+.records+"   "+.languageLabel'

Or

curl -G https://hugolpz.github.io/Sparql2Data/data/LL-LanguagesActive.json | \
    jq --raw-output '.[] | [.language, .records, .languageLabel] | @tsv'

Output:

...
Q25	33462	Esperanto
Q336	56756	Odia
Q307	62224	Bengali
Q298	92252	Polish
Q21	255005	French

Loop

Then you will have to load ($cat?) and loop over that data which will provide several values per languages such as Qid, number of records, iso, ... to do what you want to.

#!/bin/bash
# USAGE: bash loop.sh file.tsv

filepath="$1"
while IFS=$'\t' read -r llqid records; do
  # Run a command with the two columns as parameters
  echo "Running command with parameters: $llqid $records"
if [[ $records >= 50000 ]]; then
    # yearly python run $llqid
else
    # minimal python run $llqid
fi
done < "$filepath"

hugolpz · 2023-02-07T10:47:45Z

@pamputt , I see the way head.
lingua-libre/Lingua-Libre-Bot#22 has been merged, so I can move forward to refine your file into a documented bash script.

hugolpz self-assigned this Feb 6, 2023

hugolpz mentioned this issue Feb 9, 2023

Fix for T274511 lingua-libre/Lingua-Libre-Bot#22

Merged

hugolpz changed the title ~~Convert to to feed alternative usage~~ Convert to feed alternative usage Feb 9, 2023

hugolpz changed the title ~~Convert to feed alternative usage~~ Convert formats to feed alternative usage Apr 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert formats to feed alternative usage #4

Convert formats to feed alternative usage #4

hugolpz commented Feb 6, 2023 •

edited

Loading

hugolpz commented Feb 6, 2023 •

edited

Loading

pamputt commented Feb 7, 2023

hugolpz commented Feb 7, 2023 •

edited

Loading

hugolpz commented Feb 7, 2023 •

edited

Loading

Convert formats to feed alternative usage #4

Convert formats to feed alternative usage #4

Comments

hugolpz commented Feb 6, 2023 • edited Loading

Processing

CSV

hugolpz commented Feb 6, 2023 • edited Loading

pamputt commented Feb 7, 2023

hugolpz commented Feb 7, 2023 • edited Loading

JSON via sparql2data

JSON via Lingualibre API direct call

JSON via Github Sparql2Json

TSV from JSON via JQ

Loop

hugolpz commented Feb 7, 2023 • edited Loading

hugolpz commented Feb 6, 2023 •

edited

Loading

hugolpz commented Feb 6, 2023 •

edited

Loading

hugolpz commented Feb 7, 2023 •

edited

Loading

hugolpz commented Feb 7, 2023 •

edited

Loading