-
Notifications
You must be signed in to change notification settings - Fork 26
/
hyphenate.xfst
59 lines (54 loc) · 2.57 KB
/
hyphenate.xfst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
! hyphenate.xfst
!
! Released under the terms of the MIT License
! Copyright (c) 2011-2015 Çağrı Çöltekin <[email protected]>
!
! Permission is hereby granted, free of charge, to any person obtaining a
! copy of this software and associated documentation files (the "Software"),
! to deal in the Software without restriction, including without limitation
! the rights to use, copy, modify, merge, publish, distribute, sublicense,
! and/or sell copies of the Software, and to permit persons to whom the
! Software is furnished to do so, subject to the following conditions:
!
! The above copyright notice and this permission notice shall be included in
! all copies or substantial portions of the Software.
!
! THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
! IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
! FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
! AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
! LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
! FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
! DEALINGS IN THE SOFTWARE.
!
!
! This automaton hyphenates Turkish words. In Turkish,
! hyphenation is always done on syllable boundaries, and the
! rule is regular. The automaton below is based on a "textbook
! case" replace operation. The only complication comes from
! the fact that there can only be a single vowel in a syllable
! (it happens only in few foreign-origin words). We first make
! sure that all vowels are split, and then insert a hyphen
! after a C*VC* sequence, if precedes CV. Apostrophe is
! ignored.
!
! The automaton seem to work fine (not tested extensively, but
! used for a few practical cases). Only known (potential)
! problem is related to how words like 'elektrik' is
! hyphenated. Following the general pattern the automate will
! hyphenate it as e-lekt-rik, although one may consider the
! syllable boundaries to be e-lek-trik.
!
!
! Vowels
define V e | i | ö | ü | E | İ | Ö | Ü | î | Î |
a | ı | o | u | A | I | O | U | â | Â | û | Û ;
! Consonants
define C b | d | c | g | v | z | j | f | ğ | l | m | n | r | w | y |
B | D | C | G | V | Z | J | F | Ğ | L | M | N | R | W | Y |
p | t | ç | k | f | s | ş | h | P | T | Ç | K | F | S | Ş | H ;
define Apos %'|%’;
define hyphenate V -> ... %- || _ (Apos) V
.o. [C|Apos]* V [C|Apos]* @> ... %- || _ (Apos) C (Apos) V;
read regex hyphenate.i;
save stack hyphenate.fst