Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer default language values for some Literal nodes #143

Merged
merged 4 commits into from
Jan 8, 2019

Conversation

seitenbau-govdata
Copy link
Member

If a graph contains multiple languages for a node, a random node is currently used when parsing.

This introduces the option to prefer the configured default language for some nodes:

  • dct:title (dataset and distribution)
  • dct:description (dataset and distribution)
  • foaf:name in dct:publisher
  • vcard:fn in dcat:contactPoint

The changes are related to #51 and #124.

* dct:title (dataset and distribution)
* dct:description (dataset and distribution)
* foaf:name in dct:publisher
* vcard:fn in dcat:contactPoint
@amercader
Copy link
Member

Thanks @seitenbau-govdata. I wonder if this behavior (picking the default language) should be the default for all fields, not just the 4 you included here (so not having to pass prefer_default_language=True as this would be the default) . What do you think? cc @metaodi

@seitenbau-govdata
Copy link
Member Author

@amercader We thought about that as well. It would make parsing more reproducible as no more random values would be picked in case of multiple languages. But as we weren't sure about the effects of picking the default language for all Literal elemens, we only chose the attributes we needed.

In case we use it for all elements, we would drop the whole use_default_lang parameter.

@metaodi
Copy link
Member

metaodi commented Dec 19, 2018

@amercader I think it would make sense to have this as the default for all fields, since the language attribute is used to determine this behavior. So I would drop the prefer_default_language parameter.

We currently use a similar mechanism in #124 to get some values as multilingual dict. But I think it's a bit different there, as we actually change the returned value (dict instead of string).

@seitenbau-govdata
Copy link
Member Author

The language handling is used for all Literal values now.

Regarding the fallback behavior, assume two nodes exist in the graph:

  • one without lang attribute
  • one with lang=en
  1. In case no default language is configured, always the lang=en one is picked now.
  2. If a language differing from "en" is configured as default, a random node (more precisely, the first one found) is returned like with the old logic.

What do you think about that behavior @amercader @metaodi ?

@metaodi
Copy link
Member

metaodi commented Dec 19, 2018

@seitenbau-govdata I think this is fine.

Why does this only apply to Literal values? I understand that most values will be Literal's anyway, but if another type has a language attribute, the same "rule" should apply. Or am I missing something? If not, I would drop the isinstance condition.

@seitenbau-govdata
Copy link
Member Author

@metaodi The reason is that the class Literal is the only class in rdflib at the moment which contains a property language. There are some additional checks (e.g. language value validation) and the mapping from the XML attribute langto the property language in the class Literal. So we just added the check for the type Literal.

@seitenbau-govdata
Copy link
Member Author

@amercader and @metaodi Happy New Year! Do you have any comments about our latest comment? We would like to update ckanext-dcat in our installation to the latest version. And these are the last changes that should be included in the new version. So we are able to remove the patched files and close the issue in our repo GovDataOfficial/ckanext-dcatde#6. 😉

@metaodi
Copy link
Member

metaodi commented Jan 8, 2019

@seitenbau-govdata I still think it would be more pythonic to not use isinstance, but rather catch a possible AttributeError. But it's really just a minor thing, so I will just merge this now and maybe refactor it later.

@metaodi metaodi merged commit 8f37873 into ckan:master Jan 8, 2019
@seitenbau-govdata
Copy link
Member Author

@metaodi Thanks for merging it. Now it would be perfect if we could have a new version from the master. Maybe you could release a new version?

@metaodi
Copy link
Member

metaodi commented Jan 10, 2019

@seitenbau-govdata I just released 0.0.9, have fun! 😄

@seitenbau-govdata
Copy link
Member Author

@metaodi Thanks a lot!

@seitenbau-govdata seitenbau-govdata deleted the prefer-default-language-values branch January 10, 2019 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants