Skip to content

This tool could be used for performing fine-grain error analysis of a G2P model. It reports a performance matrix of a test output generated by a G2P model.

Notifications You must be signed in to change notification settings

Othergreengrasses/g2pErrorAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Error analysis tool for grapheme to phoneme (g2p) conversion

This tool performs a fine-grained error analysis of a G2P model. It reports a performance matrix of a test output generated by a G2P model.

Under this error analysis process, each test record generated from a G2P model is evaluated under two criteria. ⋅⋅* if the hypothesized form matches the corpus prediction or not. ⋅⋅* if the hypothesized form adheres to the grammatical rules of the language.

The output of the tool is a two-dimensional matrix as shown below-

               | CG Match  |   CG Not Match |
---------------|-----------+----------------|
Pron Match     |  79.34    |      05.01     |
Pron Not Match |  13.15    |      02.50     |

Here the numbers represent the percentage of test records falling under the X-section of each category.

Prerequisites

Library

install pynini install prettytable

conda install -c conda-forge pynini
pip install prettytable

Data

To run this tool following data items are required -

⋅⋅* Covering grammar : Each line contains grapheme and their corresponding pronunciation seperated by a tab. Refer adyghe_cg.tsv file inside data folder for more detail.

⋅⋅* Test output : This file contains three attributes in each line separated by a tab. The attributes are - orthography, expected pronunciation, and hypothesized pronunciation (placed in the same order). Refer test.tsv file inside the data folder for more detail.

Suggested workflow

clone repo

cd g2pErrorAnalysis
python erroranalysis.py --cg_path data/adyghe_cg.tsv --test_path data/test.tsv

Reference

Gorman, Kyle. 2016. Pynini: a Python library for weighted finite-state grammar compilation. In Proceedings of the SIGFSM workshop on statistical NLP and weighted automata, 75--80. Berlin: Association for Computational Linguistics.

Lucas F.E. Ashby, Travis M. Bartley, Simon Clematide, Luca Del Signore, Cameron Gibson, Kyle Gorman, Yeonju Lee-Sikka, Peter Makarov, Aidan Malanoski, Sean Miller, Omar Ortiz, Reuben Raff, Arundhati Sengupta, Bora Seo, Yulia Spektor, and Winnie Yan. 2021. Results of the Second SIGMORPHON Shared Task on Multilingual Grapheme-to-Phoneme Conversion. In Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 115–125, Online. Association for Computational Linguistics.

About

This tool could be used for performing fine-grain error analysis of a G2P model. It reports a performance matrix of a test output generated by a G2P model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages