-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geograpy3 for other languages #48
Comments
As of Version 0.2.0 other language support is prepared. Which language(s) would you like to see supported and what is your usecase? It would be very helpful if you specify a concrete scenario in which your need is shown. |
Hi, what is the state for supporting other languages? I would be interested in German as well. |
@JakobMiksch - you might want to first try out to create the locations.db using a wikidata query with a german query. Also it might be possible to switch to a "dynamic" wikidata query for the resolution of locations. It's a trade off for the cache size or the query speed. The data is in principle available via the "label" field of the location hierarchy within wikidata. Again - what is your usecase - e.g. how often do you need to lookup data - what is the expected response time and so on. Designing a solution will depend on knowing what your needs are. |
Need any help on this? This is what I am looking for
Hugginface NER models are pretty slow for this task. |
@astuanax could you please create a pull request for language specific wikidata queries. We also might try using a different endpoint such as QLever for make sure we don't run into timeouts. |
A first step would be to externalize the queries to yaml files using the named queries feature of the pyLodStorage QueryManager https://github.com/WolfgangFahl/pyLoDStorage/blob/ef3ff1148a5addb6974d2bc0eac1004a1ba37beb/lodstorage/query.py#L498 |
Ok, I'll have a look at the specific wikidata queries to start with. |
the tool https://www.npmjs.com/package/wikidata-taxonomy might be helpful:
will show your all specialization classes the following queries will show the top 200 classes with more than 700 instances: # relevant human settlement classes in wikidata
# based on wdtaxonomy Q486972 -c -d -m = -s
# WF 2021-08-25
SELECT ?item ?itemLabel ?itemDescription ?instances ?sites WITH {
SELECT DISTINCT ?item { ?item wdt:P279* wd:Q486972 }
} AS %items WHERE {
INCLUDE %items .
{
SELECT ?item (count(distinct ?element) as ?instances) {
INCLUDE %items.
OPTIONAL { ?element wdt:P31 ?item }
} GROUP BY ?item
}
{
SELECT ?item (count(distinct ?site) as ?sites) {
INCLUDE %items.
OPTIONAL { ?site schema:about ?item }
} GROUP BY ?item
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en"
}
FILTER(?instances>700)
} ORDER BY DESC (?instances)
LIMIT 200 |
sparqlquery -qp wikidata.yaml -qn Top200HumanSettlementClasses -f github top 200 human settlement classes https://www.wikidata.org/wiki/Q486972based on wdtaxonomy Q486972 -c -d -m = using https://www.npmjs.com/package/wikidata-taxonomy query# relevant human settlement classes in wikidata
# based on wdtaxonomy Q486972 -c -d -m = -s
# WF 2021-08-25
SELECT ?item ?itemLabel ?itemDescription ?instances ?sites WITH {
SELECT DISTINCT ?item { ?item wdt:P279* wd:Q486972 }
} AS %items WHERE {
INCLUDE %items .
{
SELECT ?item (count(distinct ?element) as ?instances) {
INCLUDE %items.
OPTIONAL { ?element wdt:P31 ?item }
} GROUP BY ?item
}
{
SELECT ?item (count(distinct ?site) as ?sites) {
INCLUDE %items.
OPTIONAL { ?site schema:about ?item }
} GROUP BY ?item
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en"
}
FILTER(?instances>700)
} ORDER BY DESC (?instances)
LIMIT 200
result
|
Is your feature request related to a problem? Please describe.
I would like to use geograpy3 in my current project, but as I'm working with sentences in German, it doesn't work...
Describe the solution you'd like
Support other languages as well.
The text was updated successfully, but these errors were encountered: