Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
phenylshima committed Jun 2, 2023
0 parents commit 71c1612
Show file tree
Hide file tree
Showing 7 changed files with 2,685,474 additions and 0 deletions.
129 changes: 129 additions & 0 deletions COPYING
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
Copyright (c) 2009, Nara Institute of Science and Technology, Japan.

All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
Neither the name of the Nara Institute of Science and Technology
(NAIST) nor the names of its contributors may be used to endorse or
promote products derived from this software without specific prior
written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Copyright (c) 2011-2017, The UniDic Consortium
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the
distribution.

* Neither the name of the UniDic Consortium nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

/* ----------------------------------------------------------------- */
/* The Japanese TTS System "Open JTalk" */
/* developed by HTS Working Group */
/* http://open-jtalk.sourceforge.net/ */
/* ----------------------------------------------------------------- */
/* */
/* Copyright (c) 2008-2016 Nagoya Institute of Technology */
/* Department of Computer Science */
/* */
/* All rights reserved. */
/* */
/* Redistribution and use in source and binary forms, with or */
/* without modification, are permitted provided that the following */
/* conditions are met: */
/* */
/* - Redistributions of source code must retain the above copyright */
/* notice, this list of conditions and the following disclaimer. */
/* - Redistributions in binary form must reproduce the above */
/* copyright notice, this list of conditions and the following */
/* disclaimer in the documentation and/or other materials provided */
/* with the distribution. */
/* - Neither the name of the HTS working group nor the names of its */
/* contributors may be used to endorse or promote products derived */
/* from this software without specific prior written permission. */
/* */
/* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND */
/* CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, */
/* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF */
/* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE */
/* DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS */
/* BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, */
/* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED */
/* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, */
/* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON */
/* ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, */
/* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY */
/* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE */
/* POSSIBILITY OF SUCH DAMAGE. */
/* ----------------------------------------------------------------- */

BSD 3-Clause License

Copyright (c) 2022, femshima

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
146 changes: 146 additions & 0 deletions char.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
#
# Japanese charcter category map
#
#

###################################################################################
#
# CHARACTER CATEGORY DEFINITION
#
# CATEGORY_NAME INVOKE GROUP LENGTH
#
# - CATEGORY_NAME: Name of category. you have to define DEFAULT class.
# - INVOKE: 1/0: always invoke unknown word processing, evan when the word can be found in the lexicon
# - GROUP: 1/0: make a new word by grouping the same chracter category
# - LENGTH: n: 1 to n length new words are added
#
DEFAULT 0 1 0 # DEFAULT is a mandatory category!
SPACE 0 1 0
KANJI 0 0 2
SYMBOL 1 1 0
NUMERIC 1 1 0
ALPHA 1 1 0
HIRAGANA 0 1 2
KATAKANA 0 1 2
KANJINUMERIC 1 1 0
GREEK 1 1 0
CYRILLIC 1 1 0

###################################################################################
#
# CODE(UCS2) TO CATEGORY MAPPING
#

# SPACE
0x0020 SPACE # DO NOT REMOVE THIS LINE, 0x0020 is reserved for SPACE
0x000D SPACE
0x0009 SPACE
0x000B SPACE
0x000A SPACE

# ASCII
0x0021..0x002F SYMBOL
0x0030..0x0039 NUMERIC
0x003A..0x0040 SYMBOL
0x0041..0x005A ALPHA
0x005B..0x0060 SYMBOL
0x0061..0x007A ALPHA
0x007B..0x007E SYMBOL

# Latin
0x00A1..0x00BF SYMBOL # Latin 1
0x00C0..0x00FF ALPHA # Latin 1
0x0100..0x017F ALPHA # Latin Extended A
0x0180..0x0236 ALPHA # Latin Extended B
0x1E00..0x1EF9 ALPHA # Latin Extended Additional

# CYRILLIC
0x0400..0x04F9 CYRILLIC
0x0500..0x050F CYRILLIC # Cyrillic supplementary

# GREEK
0x0374..0x03FB GREEK # Greek and Coptic

# HIRAGANA
0x3041..0x309F HIRAGANA

# KATAKANA
0x30A1..0x30FF KATAKANA
0x31F0..0x31FF KATAKANA # Small KU .. Small RO
# 0x30FC KATAKANA HIRAGANA # ー
0x30FC KATAKANA

# Half KATAKANA
0xFF66..0xFF9D KATAKANA
0xFF9E..0xFF9F KATAKANA

# KANJI
0x2E80..0x2EF3 KANJI # CJK Raidcals Supplement
0x2F00..0x2FD5 KANJI
0x3005 KANJI
0x3007 KANJI
0x3400..0x4DB5 KANJI # CJK Unified Ideographs Extention
0x4E00..0x9FA5 KANJI
0xF900..0xFA2D KANJI
0xFA30..0xFA6A KANJI

# KANJI-NUMERIC (一 二 三 四 五 六 七 八 九 十 百 千 万 億 兆)
0x4E00 KANJINUMERIC KANJI
0x4E8C KANJINUMERIC KANJI
0x4E09 KANJINUMERIC KANJI
0x56DB KANJINUMERIC KANJI
0x4E94 KANJINUMERIC KANJI
0x516D KANJINUMERIC KANJI
0x4E03 KANJINUMERIC KANJI
0x516B KANJINUMERIC KANJI
0x4E5D KANJINUMERIC KANJI
0x5341 KANJINUMERIC KANJI
0x767E KANJINUMERIC KANJI
0x5343 KANJINUMERIC KANJI
0x4E07 KANJINUMERIC KANJI
0x5104 KANJINUMERIC KANJI
0x5146 KANJINUMERIC KANJI

# ZENKAKU
0xFF10..0xFF19 NUMERIC
0xFF21..0xFF3A ALPHA
0xFF41..0xFF5A ALPHA
0xFF01..0xFF0F SYMBOL
0xFF1A..0xFF1F SYMBOL
0xFF3B..0xFF40 SYMBOL
0xFF5B..0xFF65 SYMBOL
0xFFE0..0xFFEF SYMBOL # HalfWidth and Full width Form

# OTHER SYMBOLS
0x2000..0x206F SYMBOL # General Punctuation
0x2070..0x209F NUMERIC # Superscripts and Subscripts
0x20A0..0x20CF SYMBOL # Currency Symbols
0x20D0..0x20FF SYMBOL # Combining Diaritical Marks for Symbols
0x2100..0x214F SYMBOL # Letterlike Symbols
0x2150..0x218F NUMERIC # Number forms
0x2100..0x214B SYMBOL # Letterlike Symbols
0x2190..0x21FF SYMBOL # Arrow
0x2200..0x22FF SYMBOL # Mathematical Operators
0x2300..0x23FF SYMBOL # Miscellaneuos Technical
0x2460..0x24FF SYMBOL # Enclosed NUMERICs
0x2501..0x257F SYMBOL # Box Drawing
0x2580..0x259F SYMBOL # Block Elements
0x25A0..0x25FF SYMBOL # Geometric Shapes
0x2600..0x26FE SYMBOL # Miscellaneous Symbols
0x2700..0x27BF SYMBOL # Dingbats
0x27F0..0x27FF SYMBOL # Supplemental Arrows A
0x27C0..0x27EF SYMBOL # Miscellaneous Mathematical Symbols-A
0x2800..0x28FF SYMBOL # Braille Patterns
0x2900..0x297F SYMBOL # Supplemental Arrows B
0x2B00..0x2BFF SYMBOL # Miscellaneous Symbols and Arrows
0x2A00..0x2AFF SYMBOL # Supplemental Mathematical Operators
0x3300..0x33FF SYMBOL
0x3200..0x32FE SYMBOL # ENclosed CJK Letters and Months
0x3000..0x303F SYMBOL # CJK Symbol and Punctuation
0xFE30..0xFE4F SYMBOL # CJK Compatibility Forms
0xFE50..0xFE6B SYMBOL # Small Form Variants

# added 2006/3/13
0x3007 SYMBOL KANJINUMERIC

# END OF TABLE
115 changes: 115 additions & 0 deletions feature.def
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#
# ここから bigram 定義
#
# %F[0..N] Unigram文脈
# %F?: 未定義の場合は,このテンプレートを適用しない

# POS Unigram
UNIGRAM U1:%F[0]
UNIGRAM U2:%F[0],%F?[1]
UNIGRAM U3:%F[0],%F[1],%F?[2]
UNIGRAM U4:%F[0],%F[1],%F[2],%F?[3]

# Word-POS
UNIGRAM W0:%F[6]
UNIGRAM W1:%F[0]/%F[6]
UNIGRAM W2:%F[0],%F?[1]/%F[6]
UNIGRAM W3:%F[0],%F[1],%F?[2]/%F[6]
UNIGRAM W4:%F[0],%F[1],%F[2],%F?[3]/%F[6]

# Word-Read-POS
UNIGRAM R0:%F[7]
UNIGRAM R1:%F[6],%F[7]
UNIGRAM R2:%F[0],%F[6],%F[7]
UNIGRAM R3:%F[0],%F?[1],%F[6],%F[7]
UNIGRAM R4:%F[0],%F[1],%F?[2],%F[6],%F[7]
UNIGRAM R5:%F[0],%F[1],%F[2],%F?[3],%F[6],%F[7]

# char type
UNIGRAM T0:%t
UNIGRAM T1:%F[0]/%t
UNIGRAM T2:%F[0],%F?[1]/%t
UNIGRAM T3:%F[0],%F[1],%F?[2]/%t
UNIGRAM T4:%F[0],%F[1],%F[2],%F?[3]/%t

#
# ここから bigram 定義
#
# %L[0..N] 左文脈
# %R[0..N] 右文脈
#
# %R?: 未定義の場合は,このテンプレートを適用しない

# 品詞
BIGRAM B00:%L[0]/%R[0]
BIGRAM B01:%L[0],%L?[1]/%R[0]
BIGRAM B02:%L[0]/%R[0],%R?[1]
BIGRAM B03:%L[0]/%R[0],%R[1],%R?[2]
BIGRAM B04:%L[0],%L?[1]/%R[0],%R[1],%R?[2]
BIGRAM B05:%L[0]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B06:%L[0],%L?[1]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B07:%L[0],%L[1],%L?[2]/%R[0]
BIGRAM B08:%L[0],%L[1],%L?[2]/%R[0],%R?[1]
BIGRAM B09:%L[0],%L[1],%L[2],%L?[3]/%R[0]
BIGRAM B10:%L[0],%L[1],%L[2],%L?[3]/%R[0],%R?[1]
BIGRAM B11:%L[0],%L[1],%L?[2]/%R[0],%R[1],%R?[2]
BIGRAM B12:%L[0],%L[1],%L?[2]/%R[0],%R[1],%R[2],%R?[3]
BIGRAM B13:%L[0],%L[1],%L[2],%L?[3]/%R[0],%R[1],%R?[2]
BIGRAM B14:%L[0],%L[1],%L[2],%L?[3]/%R[0],%R[1],%R[2],%R?[3]

# 活用
BIGRAM B20:%L[0],%L?[4]/%R[0]
BIGRAM B21:%L[0],%L?[5]/%R[0]
BIGRAM B22:%L[0],%L?[4],%L?[5]/%R[0]

BIGRAM B23:%L[0]/%R[0],%R?[4]
BIGRAM B24:%L[0]/%R[0],%R?[5]
BIGRAM B25:%L[0]/%R[0],%R?[4],%R?[5]

BIGRAM B26:%L[0],%L?[4]/%R[0],%R?[4]
BIGRAM B27:%L[0],%L?[4]/%R[0],%R?[5]
BIGRAM B28:%L[0],%L?[5]/%R[0],%R?[4]
BIGRAM B29:%L[0],%L?[5]/%R[0],%R?[5]

BIGRAM B30:%L[0],%L?[4],%L?[5]/%R[0],%R?[4]
BIGRAM B31:%L[0],%L?[4],%L?[5]/%R[0],%R?[5]

BIGRAM B32:%L[0],%L?[4]/%R[0],%R?[4],%R?[5]
BIGRAM B33:%L[0],%L?[5]/%R[0],%R?[4],%R?[5]

BIGRAM B34:%L[0],%L?[4],%L?[5]/%R[0],%R?[4],%R?[5]

# POS leaf category
BIGRAM B40:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2]
BIGRAM B41:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2]
BIGRAM B42:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2]

BIGRAM B43:%L[0],%L[1],%L[2]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B44:%L[0],%L[1],%L[2]/%R[0],%R[1],%R[2],%R?[5]
BIGRAM B45:%L[0],%L[1],%L[2]/%R[0],%R[1],%R[2],%R?[4],%R?[5]

BIGRAM B46:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B47:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2],%R?[5]
BIGRAM B48:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B49:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2],%R?[5]

BIGRAM B50:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2],%R?[4]
BIGRAM B51:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2],%R?[5]

BIGRAM B52:%L[0],%L[1],%L[2],%L?[4]/%R[0],%R[1],%R[2],%R?[4],%R?[5]
BIGRAM B53:%L[0],%L[1],%L[2],%L?[5]/%R[0],%R[1],%R[2],%R?[4],%R?[5]

BIGRAM B54:%L[0],%L[1],%L[2],%L?[4],%L?[5]/%R[0],%R[1],%R[2],%R?[4],%R?[5]

# 語彙化
BIGRAM B61:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3]
BIGRAM B61:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[4]
BIGRAM B62:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[5]
BIGRAM B63:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5]
BIGRAM B64:%L[0],%L[1],%L[2],%L[3]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B65:%L[0],%L[1],%L[2],%L[3],%L[4]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B66:%L[0],%L[1],%L[2],%L[3],%L[5]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B67:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]
BIGRAM B68:%L[0],%L[1],%L[2],%L[3],%L[4],%L[5],%L?[6]/%R[0],%R[1],%R[2],%R[3],%R[4],%R[5],%R?[6]

BIGRAM B70:%L?[6]/%R?[6]
Loading

0 comments on commit 71c1612

Please sign in to comment.