filtering longer outputs for CTC #2255
Unanswered
SuperKogito
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Issue description:
I am trying to train a German model using coqui-qi/STT docker file, however I keep running into this error:
Attempted solution
I am aware that this error is common and has been referenced before on the forum, but none of the previously mentioned solutions worked for me. This imo sums up, the previous solutions #683
I already wrote a small script to find outliers, but after a bunch of filtering with different thresholds, I still get the same issue.
Here is a small visualization of the before (left) and after (right) outliers removal:
Environment (please complete the following information):
nohup sudo docker run -v /trainingdata:/trainingdata --gpus '"device=0,1"' --ipc=host ghcr.io/coqui-ai/stt-train python -m coqui_stt_training.train --show_progressbar false --train_cudnn true --audio_sample_rate 16000 --auto_input_dataset "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/all_files_no_outliers.csv" --scorer_path "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/kenlm_de.scorer" --test_batch_size 1 --train_batch_size 1 --dev_batch_size 1 --export_batch_size 24 --epochs 2 --augment "tempo[factor=1~0.1]" --augment "pitch[pitch=1~0.1]" --augment "overlay[p=0.1,source=/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/fsdnoise.csv,layers=1,snr=12~4]" --augment "reverb[p=0.1,decay=0.7~0.15,delay=10~8]" --augment "volume[p=0.1,dbfs=-10~10]" --augment "warp[p=0.1,nt=4,nf=1,wt=0.5:1.0,wf=0.1:0.2]" --augment "frequency_mask[p=0.1,n=1:3,size=1:5]" --augment "time_mask[p=0.1,domain=signal,n=3:10~2,size=50:100~40]" --augment "dropout[p=0.1,rate=0.05]" --augment "add[p=0.1,domain=signal,stddev=0~0.5]" --augment "multiply[p=0.1,domain=features,stddev=0~0.5]" --learning_rate 0.001 --dropout_rate 0.15 --export_dir "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/model_export/coqui_training" --drop_source_layers 1 --export_language "de-Latn-DE" --export_license "Apache-2.0" --export_model_name "DeepSpeech-German" --export_model_version "0.0.1" --export_author_id "Yoummday" --checkpoint_dir "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/coqui_training_checkpoints" --summary_dir "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/model_summary/coqui_training" --reduce_lr_on_plateau true > /trainingdata/no_outliers_coqui_nohup.outrs_coqui_nohup.out
About my Datasets:
My Questions:
ignore_longer_outputs_than_inputs=True
this flag in the ctc call is not an option as far as I know).Beta Was this translation helpful? Give feedback.
All reactions