filtering longer outputs for CTC #2255

SuperKogito · 2022-07-05T13:21:01Z

SuperKogito
Jul 5, 2022

Issue description:

I am trying to train a German model using coqui-qi/STT docker file, however I keep running into this error:

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Parsed augmentations: [Multiply(p=0.1, domain='features', stddev=ValueRange(start=0.0, end=0.0, r=0.5))]
I Processing --auto_input_dataset input: /trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/all_files_no_outliers_th_3_less_than_7.csv...
I Using existing alphabet file: /trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/alphabet.txt
I Generated splits found alongside --auto_input_dataset, using them.
I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Could not find best validating checkpoint.
I Loading most recent checkpoint from /trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/coqui_training_checkpoints/train-1465522
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I Initializing variable: layer_6/bias
I Initializing variable: layer_6/bias/Adam
I Initializing variable: layer_6/bias/Adam_1
I Initializing variable: layer_6/weights
I Initializing variable: layer_6/weights/Adam
I Initializing variable: layer_6/weights/Adam_1
I STARTING Optimization
I Training epoch 0...
I Finished training epoch 0 - loss: 2441.098145
I Validating epoch 0 on /trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/dev.csv...
I Finished validating epoch 0 on /trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/dev.csv - loss: 2321.046387
--------------------------------------------------------------------------------
I FINISHED optimization in 0:00:37.281180
I Dummy run finished without problems, now starting real training process.
I STARTING Optimization
I Training epoch 0...
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Not enough time for target transition sequence (required: 179, available: 134)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
	 [[{{node tower_0/CTCLoss}}]]
	 [[tower_0/CTCLoss/_89]]
  (1) Invalid argument: Not enough time for target transition sequence (required: 179, available: 134)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
	 [[{{node tower_0/CTCLoss}}]]
0 successful operations.
1 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/code/training/coqui_stt_training/train.py", line 726, in <module>
    main()
  File "/code/training/coqui_stt_training/train.py", line 696, in main
    train()
  File "/code/training/coqui_stt_training/train.py", line 335, in train
    train_impl(epochs=Config.epochs, silent_load=True)
  File "/code/training/coqui_stt_training/train.py", line 569, in train_impl
    train_loss, _ = run_set("train", epoch, train_init_op)
  File "/code/training/coqui_stt_training/train.py", line 515, in run_set
    ) = session.run(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 955, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1179, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1358, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Not enough time for target transition sequence (required: 179, available: 134)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
	 [[node tower_0/CTCLoss (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[tower_0/CTCLoss/_89]]
  (1) Invalid argument: Not enough time for target transition sequence (required: 179, available: 134)0You can turn this error into a warning by using the flag ignore_longer_outputs_than_inputs
	 [[node tower_0/CTCLoss (defined at usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
1 derived errors ignored.

Original stack trace for 'tower_0/CTCLoss':
  File "usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "code/training/coqui_stt_training/train.py", line 726, in <module>
    main()
  File "code/training/coqui_stt_training/train.py", line 696, in main
    train()
  File "code/training/coqui_stt_training/train.py", line 335, in train
    train_impl(epochs=Config.epochs, silent_load=True)
  File "code/training/coqui_stt_training/train.py", line 392, in train_impl
    gradients, loss, non_finite_files = get_tower_results(
  File "code/training/coqui_stt_training/train.py", line 172, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(
  File "code/training/coqui_stt_training/train.py", line 95, in calculate_mean_edit_distance_and_loss
    total_loss = tfv1.nn.ctc_loss(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/ctc_ops.py", line 159, in ctc_loss
    return _ctc_loss_impl(labels, inputs, sequence_length,
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/ctc_ops.py", line 195, in _ctc_loss_impl
    loss, _ = ctc_loss_func(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/gen_ctc_ops.py", line 329, in ctc_loss
    _, _, _op = _op_def_lib._apply_op_helper(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 792, in _apply_op_helper
    op = g.create_op(op_type_name, inputs, dtypes=None, name=scope,
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/util/deprecation.py", line 513, in new_func
    return func(*args, **kwargs)
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3356, in create_op
    return self._create_op_internal(op_type, inputs, dtypes, input_types, name,
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 3418, in _create_op_internal
    ret = Operation(
  File "usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Attempted solution

I am aware that this error is common and has been referenced before on the forum, but none of the previously mentioned solutions worked for me. This imo sums up, the previous solutions #683

I already wrote a small script to find outliers, but after a bunch of filtering with different thresholds, I still get the same issue.


import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
from mpl_toolkits.mplot3d import Axes3D


def read_lines(fname):
    with open(fname, "r") as f:
        lines = f.readlines()
    return lines

def assert_all_chars_are_ascii(text, alphabet):
    txt = text.replace("\n", "").replace(" ", "")
    return all([(t in alphabet) for t in txt])

def visualize_ratio(df):
    fig = plt.figure()
    ax = Axes3D(fig)
    ax.plot_trisurf(df.wav_duration, 
                    df.len_transcription, 
                    df.len_transcription_to_duration_ratio, cmap=cm.jet, linewidth=0.2)
    plt.title('Original Code')
    plt.show()
    
    
    df.plot.scatter(x="wav_duration", y="len_transcription", colormap="viridis", figsize=(14,4))
    df.plot.scatter(x="wav_duration", y="len_transcription_to_duration_ratio", c="len_transcription", colormap="viridis", figsize=(14,4))

    
# format alphabet
alphabet = [c.replace("\n", "") for c in read_lines("alphabet.txt")]
alphabet = [a for a in alphabet if len(a) == 1]    

# init params
sr = 16000
nchannels = 1
bits_per_sample = 16

# read data and init columns 
df = pd.read_csv("all_files.csv", sep=",", header=0)
df["wav_duration"] = df.apply(lambda x: x["wav_filesize"] / (sr * nchannels * bits_per_sample/8), axis=1)
df["len_transcription"] = df.apply(lambda x: len(x["transcript"]), axis=1)
df["len_transcription_to_size_ratio"] = df.apply(lambda x: len(x["transcript"]) / x["wav_filesize"], axis=1)
df["len_transcription_to_duration_ratio"] = df.apply(lambda x: len(x["transcript"]) / x["wav_duration"], axis=1)
df["ascii_transcription"] = df.apply(lambda x: assert_all_chars_are_ascii(x["transcript"], alphabet), axis=1)

print("# Hours (with outliers): ", df.wav_duration.sum() / 3600)
visualize_ratio(df)


# remove outliers 
threshold = 3
df = df[(np.abs(stats.zscore(df["len_transcription_to_duration_ratio"])) < threshold)]
print("# Hours: (no outliers)", df.wav_duration.sum() / 3600)
visualize_ratio(df)


# write resulting df to file
df = df[["wav_filename", "wav_filesize", "transcript"]]
df.to_csv(path_or_buf="all_files_no_outliers.csv", sep=',')

Here is a small visualization of the before (left) and after (right) outliers removal:

Environment (please complete the following information):

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04 LTS (GNU/Linux 5.15.0-39-generic x86_64)
TensorFlow installed from (our builds, or upstream TensorFlow): in docker
TensorFlow version (use command below): tensorflow-gpu 1.15.4
Python version: Python 3.7.13 (default, Apr 24 2022, 01:06:43)
Bazel version (if compiling from source): no
GCC/Compiler version (if compiling from source): no
CUDA/cuDNN version: CUDA Version 10.0.130 / cuDNN 7.6.5
GPU model and memory: 2xGeForce RTX 2070 with 8 Gb
I am running the system via Docker and combined with nohup to avoid stoppage.
Exact command to reproduce:
nohup sudo docker run -v /trainingdata:/trainingdata --gpus '"device=0,1"' --ipc=host ghcr.io/coqui-ai/stt-train python -m coqui_stt_training.train --show_progressbar false --train_cudnn true --audio_sample_rate 16000 --auto_input_dataset "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/all_files_no_outliers.csv" --scorer_path "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/kenlm_de.scorer" --test_batch_size 1 --train_batch_size 1 --dev_batch_size 1 --export_batch_size 24 --epochs 2 --augment "tempo[factor=1~0.1]" --augment "pitch[pitch=1~0.1]" --augment "overlay[p=0.1,source=/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/fsdnoise.csv,layers=1,snr=12~4]" --augment "reverb[p=0.1,decay=0.7~0.15,delay=10~8]" --augment "volume[p=0.1,dbfs=-10~10]" --augment "warp[p=0.1,nt=4,nf=1,wt=0.5:1.0,wf=0.1:0.2]" --augment "frequency_mask[p=0.1,n=1:3,size=1:5]" --augment "time_mask[p=0.1,domain=signal,n=3:10~2,size=50:100~40]" --augment "dropout[p=0.1,rate=0.05]" --augment "add[p=0.1,domain=signal,stddev=0~0.5]" --augment "multiply[p=0.1,domain=features,stddev=0~0.5]" --learning_rate 0.001 --dropout_rate 0.15 --export_dir "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/model_export/coqui_training" --drop_source_layers 1 --export_language "de-Latn-DE" --export_license "Apache-2.0" --export_model_name "DeepSpeech-German" --export_model_version "0.0.1" --export_author_id "Yoummday" --checkpoint_dir "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/coqui_training_checkpoints" --summary_dir "/trainingdata/DeepSpeech/16khz-deepspeech-cleaner/languages/de/training/samples_from_one_to_twenty_seconds/model_summary/coqui_training" --reduce_lr_on_plateau true > /trainingdata/no_outliers_coqui_nohup.outrs_coqui_nohup.out

About my Datasets:

I have over 5000 hour and I am training using data with max duration = 20 seconds, but even runs with data below 7 seconds failed with the same error too.
Datasets = [Common Voice, Voxforge, Librivox, Forscher, Tatoeba, Tuda, Zamia, YouTube crawled data, crawled data from another speech2text tool]

My Questions:

At this point I am desperate for any suggestion, since even after filtering out half of the dataset, I still got this issue (Please note that I cannot change the source code since I am using the docker version, so changing the ignore_longer_outputs_than_inputs=True this flag in the ctc call is not an option as far as I know).
Is it possible to figure out the issue without long runs? I must wait almost a day after each run just to see if the epoch went through or not?
Also is there a dynamic/ computational way to figure out the biggest batch size, allowed on my GPU setup?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filtering longer outputs for CTC #2255

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

filtering longer outputs for CTC #2255

SuperKogito Jul 5, 2022

Issue description:

Attempted solution

Environment (please complete the following information):

About my Datasets:

My Questions:

Replies: 0 comments

SuperKogito
Jul 5, 2022