Arabic index configuration
+Bulgarian index configuration
+Catalan index configuration
+Czech index configuration
+Danish index configuration
+German index configuration
+Greek index configuration
+English index configuration
+Spanish index configuration
+Estonian index configuration
+http://en.wikipedia.org/wiki/Estonian_alphabet
+Persian index configuration
+Finnish index configuration
+French index configuration
+Demonstrates how to configure an internationalized index.
+Copyright (C) 2003, 2004 Innodata Isogen
+This sample may be used to create derivative text databases without restriction.
+For Public Use
+Correct Ukrainian (uk) index configuration
+Correct the "Y" position & comment of Latvian.
+Add (modify) xx-sort-rules.txt to sort Latin characters before non-Latin characters for DITA-OT 2.x.
+- Arabic (ar)
+- Bulgarian (bg)
+- Greek (el)
+- Persian (fa)
+- Hebrew (he)
+- Hindi (hi)
+- Korean (ko): Also changed collator from Java to ICU.
+- Kazakh (kk)
+- Khmer (km)
+- Kannada (kn)
+- Myanmar (my)
+- Russian (ru)
+- Sinhala (si)
+- Telugu (te)
+- Thai (th)
+Add Myanmar index configuration (my)
+Add Hindi index configuration (hi)
+Add Telugu index configuration (te)
+Add Tamil index configuration (ta)
+Add Kannada index configuration (kn)
+Add Lao index configuration (lo)
+Add Persian index configuration (fa)
+Fix th bug: group_members/lastmember was not defined.
+Fix Estonian index configuration. (et)
+- Add "et-sort-rules.txt" to collation spec. Set primary difference between "v" nad "w".
+This is probably the ICU's collation definition mistake.
+Fix Dutch index configuration. (nl)
+- Add "Ä" to "A" entry.
+Fix Romanian index configuration. (ro)
+- Add U+0218, U+0219 to "Ş" entry.
+- Add U+021A, U+021B to "Ţ" entry.
+Refine Latvian index configuration. (lv)
+- Set primary difference between following characters by introducing lv-sort-rules.txt.
+"A" and "Ā", "E" and "Ē", "I" and "Ī", "U" and "Ū"
+Refine Lithuanian index configuration. (lt)
+- Set primary difference between following characters by introducing lt-sort-rules.txt.
+"A" and "Ą", "E" and "Ę" and "Ė", "I" and "Į" and "Y", "U" and "Ų" and "Ū"
+Refine Croatian index configuration. (hr)
+- Add language code "hr-HR".
+- Add "Dž" to "DŽ" entry.
+- Add "Lj" to "LJ" entry.
+- Add "Nj" to "NJ" entry.
+Refine Estonian index configuration. (et)
+- Change the alphabet order based on http://en.wikipedia.org/wiki/Estonian_alphabet
+Refine Icelandic index configuration. (is)
+- Add language code "is-IS".
+Refine Latvian index configuration. (lv)
+- Rename la.xml to lv.xml.
+- Add language code "lv-LV"
+Modify Lithuanian index configuration. (lt)
+- Add language code "lt-LT"
+- Add index group "Ą","Ę","Ė","Į","Ų","Ū".
+Refine Slovenian index configuration. (sl)
+- Add language code "sl-SI"
+Add Tagalog index configuration for testing. (tl)
+- Tagalog is used in Philippines.
+Add Sinhala index configuration for testing. (si)
+- Sinhala is used in Sri Lanka.
+Add Khmer index configuration for testing. (km)
+- Khmer is used in Cambodia.
+Add Swahili index configuration for testing. (sw)
+Add Kazakh index configuration for testing. (kk)
+Change Thai last character from ໝ U+0EDD to ฮ U+0E2E.
+U+0EDD is illegal for Thai because it belongs to Lao.
+- Add "ms" entry.
+- Change to use entity reference for each language.
+Modify "id" entry.
+- Remove (comment out) java_collation_spec and uncertain entry members.
+Add "vi-VN"
+Modify "bg","ru" entry.
+- Remove English key portion from key definition. It is now supported automatically by new i18n_support.jar.
+Modify "es" entry.
+- Remove "ĵ,Ĵ" from "J" entry.
+Modify "fi" entry.
+- Integrate "V" and "W" to one entry "V,W".
+Modify "zh-TW" entry.
+- Adopt "劃" instead of "畫" for 画数.
+Modify "ru" entry. Remove user specific entries from definition.
+- Remove "ķ","Ķ" from "K" group.
+- Remove "ò","Ò" from "O" group.
+- Remove "ß" from "S" group.
+- Remove "ѓ", "Ѓ" from "Г" group
+- Remove "ў", "Ў" from "У" group.
+Modify "sk" entry.
+- Make "Ď" group. Remove "Ď" entry from "D" group.
+- Add "DZ", "DŽ" group.
+- Add "cH", "CH" to "CH" group.
+- Make "Ň" group. Remove "Ň" entry from "N" group.
+- Make "Ŕ" group. Remove "Ŕ" entry from "R" group.
+- Make "Ť" group. Remove "Ť" entry from "T" group.
+- Make "Ž" group. Remove "Ž" entry from "Z" group.
+- Add sk-sort-rules.txt because ICU nor Java Collator does not define "á","Á","dz", "ľ","Ľ","ĺ","Ĺ","ň","Ň","ó","Ó","ŕ","Ŕ" entries for Slovakian.
+Modify "cs" entry
+- Add "cH" to "CH" group.
+Modify "ko" entry.
+- Use full-width character for Hungul Jamo.
+Change botb_index_rules.dtd to allow multiple <national_language> elements to hold the dialects or [lang]-[country] style language code.
+- Add "bg-BG","cs-CZ","da-DK","de-DE","de-CH","nl-NL","en-US","en-GB","es-ES","es-419","es-US","fi-FI","fr-FR","fr-BE","fr-CA","fr-CH", + "he-IL","hu-HU","id-ID","it-IT", "ja-JP","ko-KR","no-NO","pl-PL","pt-PT","pt-BR","ro-RO","ru-RU","sk-SK","sv-SE","th-TH","tr-TR"
+Modify "zh-CN", "zh-TW" entry.
+- Delete "I" group. (Such pinyin character does not exist!)
+- Change each group's first code according to the change of zh-CN-sort-rules.txt, zh-TW-sort-rules.txt
+Modify "ja" entry.
+- Make "ん" entry. Delete "ん" entry from "を" group.
+- Modify ja-sort-rules.txt. Remove complex sorting rules. These sort rule is not needed if there is no strict sorting rule request.
+Modify "el" entry.
+- Add "ϊ" U+03CA to "Ι" (U+0399) group.
+- Add "µ" U+00B5 to "Μ" (U+039C) group.
+Added ru-sort-rules.txt to "ru" entry because there are grouping trouble in "Е","Ё","И","Й" when using ICU4j_3_4_4.jar sorting.
+Update "ko" entry.
+- Add group_label to each term group. Update all group_key value.
+- Add U+110F group_label
+- Remove ko-sort-rules.txt.
+Comment out "zh-CN" entry's sort_english_before element to keep compatibility with previous version.
+Added term group for english in "zh-CN" entry and added comment.
+adding "he" entry.
+adding "ar" entry.
+Correct ko entry.
+Modify "botb_index_rules/ko-sort-rules.txt" to "ko-sort-rules.txt".
+Correct zh-CN, zh-TW, ja entry.
+Commnet-out replace-rules element to sort space code before alphabets.
+Replace corresponding zh-CN-sort-rules.txt, zh-TW-sortr-ules.txt, + ja-sort-rules.txt.
+Correct 'ru' entry.
+Add missing 'O'(U+004F) to char_or_seq.
+Correct 'es' entry.
+Add 'ª'(U+00AA), 'º'(U+00BA).
+Correct 'ca' entry.
+Remove 'á', 'Á', 'ď', 'Ď', 'ĺ', 'Ĺ', 'ł', 'Ł'. Add 'ç', 'Ç'.
+Restored 'tr' entry's groupkey 'Q', 'W', 'X' for foreign words.
+Modify 'ja' entry's index label like "ア行"→"あ".
+Delete "ン行" entry.
+Add 'ja' entry.
+Modify 'fr' entry to add U+0153, U+0152, U+00FC, U+00DC.
+Modify 'fi' entry due to the ICU sorting BUG. ("W" entry is inserted between + "V" entries.) Added 'fi-sort-rules.txt'.
+Modify 'ca', 'cs', 'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'nl', + 'no', 'pl', 'pt', 'ru', 'sv' entries.
+Add 'cs', 'ca' entries. Delete 'CH' entry from 'es'.
+Introduce collation_spec/use_java_collator and replace_rules element.
+- use_java_collator element means to use java.text.Collator or RuleBasedCollator. Default is off. Off means to use ICU collator. Currently most of the Latin language uses ICU collator.
+- replace_rules element means to replace collator rule by specified collator rule text file content. (java_collation_spec/include_collation_spec) Default is off. Off means to use default collator rule plus specified collator rule.
+Modify 'zh-CN', 'zh-TW' entries.
+Created initial sample index rules file.
+Hebrew index configuration
+Hindi index configuration
+Croatian index configuration
+http://en.wikipedia.org/wiki/Croatian_language
+HR: Croatia
+Hungarian index configuration
+Indonesian index configuration
+Icelandic index configuration
+http://en.wikipedia.org/wiki/Icelandic_alphabet
+"C","Q","W" are prepared for loan words only.
+The letter "Z" was used until 1973.
+Italian index configuration
+Japanese index configuration
+Kazakh index configuration
+Khmer index configuration
+Kannada index configuration
+Index configuration for Korean
+Index configuration for Lao
+Lithuanian index configuration
+http://en.wikipedia.org/wiki/Lithuanian_alphabet
+q,w,x are used in loan words.
+Latvian index configuration
+Bahasa Malaysia index configuration
+Myanmar index configuration
+Dutch index configuration
+Norwegian index configuration
+Polish index configuration
+Portuguese index configuration
+Romanian index configuration
+Russian index configuration
+Sinhala index configuration
+Slovak index configuration
+Slovenian index configuration
+http://en.wikipedia.org/wiki/Slovene_alphabet
+Swedish index configuration
+Swahili index configuration
+"Q","X" are used for loan words only.
+There are two digraphs for native sounds, ch and sh; c is not used apart from unassimilated English loans and occasionally as a substitute for k in advertisements.
+http://en.wikipedia.org/wiki/Swahili_language#Orthography
+Tamiḻ index configuration
+Telugu index configuration
+Index configuration for Thai
+Tagalog index configuration
+(Tagalog is used in Philippines)
+tl-sort-rules.txt is adopted because ICU does not support Tagalog collation for language code "tl".
+Turkish index configuration
+Ukrainian index configuration
+Vietnamese index configuration
+Index configuration for Chinese cn
+Simplified Chinese is sorted by Pin-Yin transliteration. This definition was created + using the data from the Unicode unihan.txt database, using the first pronunciation + listed for each character.
+Index configuration for Traditional Chinese
+Traditional Chinese is grouped and sorted by stroke count and then radical ordering + within a single stroke count. Configuring this correctly requires access to a + Traditional Chinese dictionary.
+