Skip to content

SakethVNS/kaldi_training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GMM-HMM and TDNN training procedure Repository

In Kaldi all projects are present inside the egs directory from Kaldi root directory. An example is shown in the figure below

fold_structure
Ref: https://www.eleanorchodroff.com/tutorial/kaldi/familiarization.html

The explanation used in this is mainly dervied from the references given below. First let's go through the wsj folder structure and then create an example directory of our own.

Data Preparation

WSJ directory structure

In wsj directory we see other directory s5(different versions) where actual files reside.

  • utils, steps and local directories contain the necessary files for further processing.
  • exp directory contain all the model parameters be it GMM or TDNN model. It will have the acoustic model.
  • conf directory contain config files that indicate any parameters that are to be set like sampling frequency of audios, beam and lattice beam widths etc.
  • data directory contain all the input data that is needed for training, validation and testing. In ASR as input we need audios, transcripts, words and their phonetic representation, nonsilence phones, silence phones etc. These files are to be created by the user.

Inside data directory we have train, lang and dict directories.

  • In train sub-directory four files are needed to be created fundamentally. wav.scp, text, utt2spk and spk2utt. Further details of these files can be found in here and here.
  • In dict(mentioned here as local/lang ) sub-directory we need files that are mentioned in detail here and here

Creating custom directory

custom_folder
As the directories utils and steps are common to may projects we can simply create a symbolic link as shown here

After creating train(and correspondingly validation and test) and dictionary directories. We will create L.fst. For that we need an OOV entry which is used for any word that is not present in the lexicon. That OOV symbol is needed to be present in lexicon as a word. Follow the commands here to create the lang directory where L.fst is created. This will be used later. L.fst is nothing but the pronunciation model for the corpus. After we create a lanugage model in following steps a G.fst file will be created in this location.

After this we proceed to compute features from the audios. Config files are set as shown here. In the config.ini file a variable(mfcc_conf) is present. This variable has path to mfcc config file

At this stage train folder, dictionary folder and pronunciation model have been prepared. In the Hybrid ASR system an HCLG graph is obtained from four components Acoustic Model(H), Context Transducer(C), Pronunciation Model(L) and Language Model(G). All these components are individually obtained and then a decoding graph is constructed. Pronunciation Model(L) is already obtained. Now we will look at Acoustic Model(H) Training.

GMM-HMM Training

Starting with monophone training steps are mentioned here. Next we move to triphone training using the alignment obtained from monophone training. There are three levels of triphone training and parameters for each are given in config.ini file After each level we use the decode script and get the decoded output of the trained triphone system

TDNN Training

After gmm-hmm training we proceed to tdnn training. A brief overview is given here. Training parameters can be changed under stage 16 where train.py is called. The baseline training has finished. Decode any test set with this trained model.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published