Skip to content

Commit

Permalink
[mlt] Updates Maltese phonelist (#517)
Browse files Browse the repository at this point in the history
* [mlt] Updates Maltese phonelist.

Due either to bugs or changes in the upstream data, I noticed there was
a very high rate of filtration on Maltese. It seems that [u] was not
included, nor was one of the affricates.

There are still some filtration for "archaic" pronunciations of
[ɣ] for <għ>, which is WAI.

* Changelog

* Adds Python 3.12 support

* project classifers
* tests on CircleCI

* Revert "Adds Python 3.12 support"

This reverts commit e72bc3d.
  • Loading branch information
kylebgorman authored Feb 21, 2024
1 parent 3a7d452 commit b58c4c3
Show file tree
Hide file tree
Showing 9 changed files with 5,485 additions and 84 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Unreleased

#### Added

- Updated Maltese (`mlt`) phonelist. (\#517)
- Fixed path bug in `generate_summary.py`. (\#517)
- Fixed CLI arg bug in `list_phones.py`. (\#516)
- Big scrape for 2024. (\#514)
- Big scrape for 2023. (\#512)
Expand Down
12 changes: 6 additions & 6 deletions data/phones/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
See the [HOWTO](HOWTO.md) for the steps to generate phone lists.
| Link | ISO 639-3 Code | ISO 639 Language Name | Wiktionary Language Name | Narrow/broad | # of phones |
| :---- | :----: | :----: | :----: | :----: | :----: |
| [phone](phones/ady_narrow.phones) | ady | Adygei; Adyghe | Adyghe | Narrow | 67 |
| [phone](phones/ady_narrow.phones) | ady | Adyghe | Adyghe | Narrow | 67 |
| [phone](phones/afr_broad.phones) | afr | Afrikaans | Afrikaans | Broad | 61 |
| [phone](phones/aze_narrow.phones) | aze | Azerbaijani | Azerbaijani | Narrow | 54 |
| [phone](phones/bul_broad.phones) | bul | Bulgarian | Bulgarian | Broad | 52 |
Expand All @@ -25,16 +25,16 @@ See the [HOWTO](HOWTO.md) for the steps to generate phone lists.
| [phone](phones/kor_narrow.phones) | kor | Korean | Korean | Narrow | 61 |
| [phone](phones/lat_clas_broad.phones) | lat | Latin | Latin (Classical) | Broad | 36 |
| [phone](phones/lav_narrow.phones) | lav | Latvian | Latvian | Narrow | 89 |
| [phone](phones/mlt_broad.phones) | mlt | Maltese | Maltese | Broad | 59 |
| [phone](phones/mlt_broad.phones) | mlt | Maltese | Maltese | Broad | 61 |
| [phone](phones/mya_broad.phones) | mya | Burmese | Burmese | Broad | 70 |
| [phone](phones/nld_broad.phones) | nld | Dutch; Flemish | Dutch | Broad | 50 |
| [phone](phones/nld_broad.phones) | nld | Dutch | Dutch | Broad | 50 |
| [phone](phones/nob_broad.phones) | nob | Norwegian Bokmål | Norwegian Bokmål | Broad | 54 |
| [phone](phones/por_bz_broad.phones) | por | Portuguese | Portuguese (Brazil) | Broad | 55 |
| [phone](phones/por_po_broad.phones) | por | Portuguese | Portuguese (Portugal) | Broad | 48 |
| [phone](phones/ron_narrow.phones) | ron | Romanian; Moldavian; Moldovan | Romanian | Narrow | 51 |
| [phone](phones/ron_narrow.phones) | ron | Romanian | Romanian | Narrow | 51 |
| [phone](phones/slv_broad.phones) | slv | Slovenian | Slovene | Broad | 48 |
| [phone](phones/spa_ca_broad.phones) | spa | Spanish; Castilian | Spanish (Castilian) | Broad | 29 |
| [phone](phones/spa_la_broad.phones) | spa | Spanish; Castilian | Spanish (Latin America) | Broad | 27 |
| [phone](phones/spa_ca_broad.phones) | spa | Spanish | Spanish (Castilian) | Broad | 29 |
| [phone](phones/spa_la_broad.phones) | spa | Spanish | Spanish (Latin America) | Broad | 27 |
| [phone](phones/tur_narrow.phones) | tur | Turkish | Turkish | Narrow | 51 |
| [phone](phones/vie_hanoi_narrow.phones) | vie | Vietnamese | Vietnamese (Hà Nội) | Narrow | 54 |
| [phone](phones/vie_hcmc_narrow.phones) | vie | Vietnamese | Vietnamese (Hồ Chí Minh City) | Narrow | 50 |
Expand Down
16 changes: 9 additions & 7 deletions data/phones/lib/generate_summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,13 @@

from typing import Any, Dict

from data.scrape.lib.codes import (
LANGUAGES_PATH,
PHONES_SUMMARY_PATH,
PHONES_README_PATH,
PHONES_DIRECTORY,
LIB_DIRECTORY = os.path.dirname(os.path.realpath(__file__))
PHONES_DIRECTORY = os.path.normpath(os.path.join(LIB_DIRECTORY, os.pardir))
PHONES_README_PATH = os.path.join(PHONES_DIRECTORY, "README.md")
PHONES_SUMMARY_PATH = os.path.join(PHONES_DIRECTORY, "summary.tsv")
PHONES_PHONES_DIRECTORY = os.path.join(PHONES_DIRECTORY, "phones")
LANGUAGES_PATH = os.path.normpath(
os.path.join(PHONES_DIRECTORY, os.pardir, "scrape/lib/languages.json")
)


Expand All @@ -37,9 +39,9 @@ def main() -> None:
languages = json.load(source)
readme_list = []
phones_summaries = []
for file_path in os.listdir(PHONES_DIRECTORY):
for file_path in os.listdir(PHONES_PHONES_DIRECTORY):
with open(
f"{PHONES_DIRECTORY}/{file_path}", "r", encoding="utf-8"
f"{PHONES_PHONES_DIRECTORY}/{file_path}", "r", encoding="utf-8"
) as phone_list:
# We exclude blank lines and comments.
num_of_entries = sum(
Expand Down
4 changes: 3 additions & 1 deletion data/phones/phones/mlt_broad.phones
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ zː
l
t͡ʃ
d͡z
d͡ʒ
ʃ
ʃː
Expand All @@ -37,6 +38,7 @@ jː
k
ɡ
ɣ
ħ
ʔ
# VOWELS
Expand All @@ -52,8 +54,8 @@ iː
ɐː
a
u
ʊ
ɔ
ɔː
# PHARYNGEALIZED VOWELS
Expand Down
12 changes: 6 additions & 6 deletions data/phones/summary.tsv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ady_narrow.phones ady Adygei; Adyghe Adyghe Narrow 67
ady_narrow.phones ady Adyghe Adyghe Narrow 67
afr_broad.phones afr Afrikaans Afrikaans Broad 61
aze_narrow.phones aze Azerbaijani Azerbaijani Narrow 54
bul_broad.phones bul Bulgarian Bulgarian Broad 52
Expand All @@ -22,16 +22,16 @@ khm_broad.phones khm Khmer Khmer Broad 73
kor_narrow.phones kor Korean Korean Narrow 61
lat_clas_broad.phones lat Latin Latin (Classical) Broad 36
lav_narrow.phones lav Latvian Latvian Narrow 89
mlt_broad.phones mlt Maltese Maltese Broad 59
mlt_broad.phones mlt Maltese Maltese Broad 61
mya_broad.phones mya Burmese Burmese Broad 70
nld_broad.phones nld Dutch; Flemish Dutch Broad 50
nld_broad.phones nld Dutch Dutch Broad 50
nob_broad.phones nob Norwegian Bokmål Norwegian Bokmål Broad 54
por_bz_broad.phones por Portuguese Portuguese (Brazil) Broad 55
por_po_broad.phones por Portuguese Portuguese (Portugal) Broad 48
ron_narrow.phones ron Romanian; Moldavian; Moldovan Romanian Narrow 51
ron_narrow.phones ron Romanian Romanian Narrow 51
slv_broad.phones slv Slovenian Slovene Broad 48
spa_ca_broad.phones spa Spanish; Castilian Spanish (Castilian) Broad 29
spa_la_broad.phones spa Spanish; Castilian Spanish (Latin America) Broad 27
spa_ca_broad.phones spa Spanish Spanish (Castilian) Broad 29
spa_la_broad.phones spa Spanish Spanish (Latin America) Broad 27
tur_narrow.phones tur Turkish Turkish Narrow 51
vie_hanoi_narrow.phones vie Vietnamese Vietnamese (Hà Nội) Narrow 54
vie_hcmc_narrow.phones vie Vietnamese Vietnamese (Hồ Chí Minh City) Narrow 50
Expand Down
6 changes: 3 additions & 3 deletions data/scrape/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
* Broad transcription files: 19
* Narrow transcription files: 20
* Scripts: 44
* Pronunciations: 3,529,423
* Pronunciations: 3,530,261


| Link | ISO 639-3 Code | ISO 639 Language Name | Wiktionary Language Name | Script | Dialect | Filtered | Narrow/Broad | Case-folding | # of entries |
Expand Down Expand Up @@ -281,8 +281,8 @@
| [TSV](tsv/mic_latn_narrow.tsv) | mic | Mi'kmaq | Mi'kmaq | Latin | | False | Narrow | True | 193 |
| [TSV](tsv/mkd_cyrl_narrow.tsv) | mkd | Macedonian | Macedonian | Cyrillic | | False | Narrow | True | 61,366 |
| [TSV](tsv/mlg_latn_broad.tsv) | mlg | Malagasy | Malagasy | Latin | | False | Broad | True | 186 |
| [TSV](tsv/mlt_latn_broad.tsv) | mlt | Maltese | Maltese | Latin | | False | Broad | True | 17,515 |
| [TSV](tsv/mlt_latn_broad_filtered.tsv) | mlt | Maltese | Maltese | Latin | | True | Broad | True | 13,716 |
| [TSV](tsv/mlt_latn_broad.tsv) | mlt | Maltese | Maltese | Latin | | False | Broad | True | 18,353 |
| [TSV](tsv/mlt_latn_broad_filtered.tsv) | mlt | Maltese | Maltese | Latin | | True | Broad | True | 18,273 |
| [TSV](tsv/mnc_mong_narrow.tsv) | mnc | Manchu | Manchu | Mongolian | | False | Narrow | False | 1,451 |
| [TSV](tsv/mnw_mymr_broad.tsv) | mnw | Mon | Mon | Myanmar | | False | Broad | False | 1,156 |
| [TSV](tsv/mon_cyrl_broad.tsv) | mon | Mongolian | Mongolian | Cyrillic | | False | Broad | True | 3,479 |
Expand Down
4 changes: 2 additions & 2 deletions data/scrape/summary.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -269,8 +269,8 @@ mic_latn_broad.tsv mic Mi'kmaq Mi'kmaq Latin False Broad True 195
mic_latn_narrow.tsv mic Mi'kmaq Mi'kmaq Latin False Narrow True 193
mkd_cyrl_narrow.tsv mkd Macedonian Macedonian Cyrillic False Narrow True 61366
mlg_latn_broad.tsv mlg Malagasy Malagasy Latin False Broad True 186
mlt_latn_broad.tsv mlt Maltese Maltese Latin False Broad True 17515
mlt_latn_broad_filtered.tsv mlt Maltese Maltese Latin True Broad True 13716
mlt_latn_broad.tsv mlt Maltese Maltese Latin False Broad True 18353
mlt_latn_broad_filtered.tsv mlt Maltese Maltese Latin True Broad True 18273
mnc_mong_narrow.tsv mnc Manchu Manchu Mongolian False Narrow False 1451
mnw_mymr_broad.tsv mnw Mon Mon Myanmar False Broad False 1156
mon_cyrl_broad.tsv mon Mongolian Mongolian Cyrillic False Broad True 3479
Expand Down
Loading

0 comments on commit b58c4c3

Please sign in to comment.