Skip to content

Latest commit

 

History

History
46 lines (36 loc) · 3.66 KB

README.md

File metadata and controls

46 lines (36 loc) · 3.66 KB

MFA Runner for Beginners

A simple tool to easily use Montreal Forced Aligner.

Description

These days, as speech research community rapidly grows, text-wav forced alignment is necessary to the research such as Text-to-Speech, Voice Conversion and other speech-related search field. One simple and widely-used approach is to use Montreal Forced Aligner(MFA) [McAuliffe17] as text-wav forced aligner. Despite of lots of necessity, some speech-research beginners may feel that it is hard to train their custom dataset. For them, this repository offers following operations and procedures that are needed to run MFA with little efforts.

How to use

To run this program, please follow the procedure below.

  • Install anaconda and python=3.9.
  • Install MFA and download ESD dataset
  • Install pre-requisite modules using pip via following command pip install -r requirements.txt
  • Edit config.py to point your database.
  • Run formatter python main.py

Alignments

As a result of this tutorial, I upload text-wav alignment extracted using MFA.

Visualization of Extracted Alignments

Please refer visualise_alignment.ipynb.

Supported Dataset

Experimental Notes

  • currently, only supports ESD
  • Different emotions belonging to a single speaker are considered independently. (i.e., utterances with emotion 'Angry' and utterances with emotion 'Sad' from same speaker are treated with different speakers.) This is a simple "remedy" to reduce complexity of style(emotion) distribution.
  • Please note that extracted alignments may not be accurate.
  • Regarding ESD dataset, only English speakers are used.

Contacts

Please email to [email protected]. Any suggestion or question be appreciated. Hope that this repository be helpful.