-
Notifications
You must be signed in to change notification settings - Fork 0
Home
SebastianNordhoff edited this page Oct 17, 2015
·
1 revision
This page contains documentation about the lexical data in tabular form for want of a better place.
The refugee phrasebook data are currently stored in google spreadsheets. Everybody can read. Selected users get write access to provide information about a particular language.
The overarching principle is: help people with basic communication. Prefer ease of use over grammatical correctness.
The following principles should be heeded when filling in data or creating new spreadsheets.
- Every cell contains exactly one expression in exactly one language. Do not put several languages in one cell. Do not provide alternatives. If there are two equally good terms, choose one based on your best judgement, even if the choice might be ungrammatical some times .
- Every row contains exactly one meaning in several languages. Use several rows for lists of items, like numbers, days of the week or months.
- Every column contains exactly one language in exactly one language. Do not conflate similar languages. Do not conflate orthographic and phonetic renderings.
- The first column contains the ID of the row. The following priciples apply 1.The ID is semantic.
- The ID is written in ALLCAPS.
- The ID employs telegraphic style
- The ID ideally relates to a standardized vocabulary.
- Currently, ICD10 is used for medical conditions and the Fundational Model for Anatomy is used for body parts.
- The first full row contains a human readable language name
- The second full row contains an IETF language tag. This normally consists of an ISO 639-3 language tag and an optional ISO 15924 script tag (
Latn
,Cyrl
,Arab
are frequent). UseQaai
as a script tag for IPA andQaas
as a script tag for SAMPA. Technically speaking, SAMPA is not a separate script, but for our use case, it can be treated as one. The same is true for IPA, even if one could argue that IPA is a script different from Latin script. - The ID of the row and the ID of the column allow the identification of a cell, e.g.
GIVE_OWN_NAME:tir-Latn
identifies the way how to give one's name in Tigrinya, rendered with Latin script: "semey ___ ey-yu."