This tool performs a fine-grained error analysis of a G2P model. It reports a performance matrix of a test output generated by a G2P model.
Under this error analysis process, each test record generated from a G2P model is evaluated under two criteria. ⋅⋅* if the hypothesized form matches the corpus prediction or not. ⋅⋅* if the hypothesized form adheres to the grammatical rules of the language.
The output of the tool is a two-dimensional matrix as shown below-
| CG Match | CG Not Match |
Pron Match | 79.34 | 05.01 |
Pron Not Match | 13.15 | 02.50 |
Here the numbers represent the percentage of test records falling under the X-section of each category.
install pynini install prettytable
conda install -c conda-forge pynini
pip install prettytable
To run this tool following data items are required -
⋅⋅* Covering grammar : Each line contains grapheme and their corresponding pronunciation seperated by a tab. Refer adyghe_cg.tsv file inside data folder for more detail.
⋅⋅* Test output : This file contains three attributes in each line separated by a tab. The attributes are - orthography, expected pronunciation, and hypothesized pronunciation (placed in the same order). Refer test.tsv file inside the data folder for more detail.
clone repo
cd g2pErrorAnalysis
python --cg_path data/adyghe_cg.tsv --test_path data/test.tsv
Gorman, Kyle. 2016. Pynini: a Python library for weighted finite-state grammar compilation. In Proceedings of the SIGFSM workshop on statistical NLP and weighted automata, 75--80. Berlin: Association for Computational Linguistics.
Lucas F.E. Ashby, Travis M. Bartley, Simon Clematide, Luca Del Signore, Cameron Gibson, Kyle Gorman, Yeonju Lee-Sikka, Peter Makarov, Aidan Malanoski, Sean Miller, Omar Ortiz, Reuben Raff, Arundhati Sengupta, Bora Seo, Yulia Spektor, and Winnie Yan. 2021. Results of the Second SIGMORPHON Shared Task on Multilingual Grapheme-to-Phoneme Conversion. In Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 115–125, Online. Association for Computational Linguistics.