Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual data and style structures #332

Open
bwiernik opened this issue Jul 30, 2020 · 6 comments
Open

Multilingual data and style structures #332

bwiernik opened this issue Jul 30, 2020 · 6 comments

Comments

@bwiernik
Copy link
Member

We started to discuss multilingual data structures here: #327

In terms of data structures, my inclination is that storing multilingual variants should occur at the field-level. So, any field might be object with value, language, and translated elements. The translated element would be an array with elements holding value and language elements. Subordinate elements without a language would inherit language from their parent. That would have 3 benefits:

  1. It would permit simple indication that a field is a different language than the item (e.g., an English article published in a German journal).
  2. It would jive with https://juris-m.readthedocs.io/en/latest/dev-sync-simplification.html
  3. It would provide a consistent structure for providing translations of one or more fields for an item.

Originally posted by @bwiernik in #327 (comment)

@denismaier
Copy link
Member

That looks like a sound approach.
Ideally, we would still allow a flat string approach as a simpler alternative if no multilingual data input is needed. So you could do:

publisher: Oxford University Press

Or:

publisher: 
  value: whatever
  language: ar
  language-alternate:
    value: Transliteration of the publisher's Arabic name
    language: ar-alc97

@denismaier
Copy link
Member

Another question will then be how these language alternates will be accessible in styles. A simple approach could be something like testing for a language attribute, like <if variable="title" language-alternate="de">. (See #327 (comment))

The drawback of this is that it will make style coding more complicated than necessary. In the medium to long run (i.e. after 1.1) we should therefore consider adding (optional, modularized) features to simplify this. I could imagine three potential solutions:

  1. New attributes on cs:style, cs:bibliography and cs:citation

  2. A new element cs:multilingual next to cs:citation and cs:bibliography.

    <multilingual>
      <titles>
        <main/>
        <alternate="en" prefix="[ " suffix="]"/>
      </titles>
    </multlingual> 
  3. This new cs:multilingual element could even work a bit like locales. You'd have special multilingual configuration files that could be used together with regular styles.

@bwiernik
Copy link
Member Author

Ideally, we would still allow a flat string approach as a simpler alternative if no multilingual data input is needed.

Yes, I think we would make the version on the CSL JSON explicitly denote that it is multilingual. That way, we can allow for normally-flat-string fields to be either flat-strings or objects. That gets around the contortions @fbennett needed to do to make CLSm JSON type-compatible with CSL JSON.

If CSL-ML JSON needs to be converted to vanilla CSL JSON, its the simple transformation that string-type variables have their value extracted and the multilingual elements dropped.

@bwiernik
Copy link
Member Author

bwiernik commented Jul 31, 2020

There are a few "levels of involvement" here:

"Simple" handling

  1. Rendering of individual translated fields.
  2. Rendering of transliterations instead of original-script fields style-wide.
    (These are what APA wants, for example). These we could consider adopting into vanilla CSL.

"Moderate" handling

  1. Consistent rendering of both/all of original script, transliterated, and translated fields style-wide.
    This could be handled using something like cs:multilingual above (which I think looks similar to the CSLm cs:alternative). I would suggest here that it always goes to the locale/writing system of the bibliography environment its rendered in. Multiple locales is a further step (below).

"Complex" handling

  1. Separate bibliography layouts by locale.
    Ala CSLm.

@bdarcus bdarcus pinned this issue Aug 10, 2020
@denismaier
Copy link
Member

denismaier commented Sep 8, 2020

Just a quick note: biblatex is adding multiscript support: https://raw.githubusercontent.com/plk/biblatex/multiscript/doc/latex/biblatex/biblatex.tex

More information under \subsection{Multiscript Support}

(Don't know how exactly that will work. Need to digest that first...)

@denismaier
Copy link
Member

I just ran into this: https://www.ctan.org/pkg/biblatex-ms

Apparently, the biblatex folks have recently published the multiscript variant of biblatex. As of now, both versions seem to exist in parallel, the multiscript version is said to be slower and still a bit experimental, but it should eventually replace the current version.

Should be worthwhile looking into this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants