Adjust default parameters to make even small samples do something #628

joanise · 2025-01-27T14:08:04Z

Bug description

For regression testing, the 15 minute of data test case yielded 150 utterances from LJ, and that caused training to fail with this error:

2025-01-23 12:13:40.264 | ERROR    | everyvoice.utils:filter_dataset_based_on_target_text_representation_level:96 - Sorry you do not have enough characters data in your current validation filelist to run the model with a batch size of 16.

This appears to be due to having just 15 samples in the validation set.

We should adjust the default wizard and training defaults so that if the data has <160 utterances, things are setup so training can proceed anyway.

How to reproduce the bug

Create a dataset with 150 samples
run the wizard
everyvoice preprocess config/everyvoice-text-to-spec.yaml
everyvoice train text-to-spec config/everyvoice-text-to-spec.yaml

Or from the branch for #616 run go.sh and inspect the logs in regress-lj-150/

Error messages and logs

2025-01-23 12:13:40.264 | ERROR    | everyvoice.utils:filter_dataset_based_on_target_text_representation_level:96 - Sorry you do not have enough characters data in your current validation filelist to run the model with a batch size of 16.

The text was updated successfully, but these errors were encountered:

joanise added the bug Something isn't working label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust default parameters to make even small samples do something #628

Adjust default parameters to make even small samples do something #628

joanise commented Jan 27, 2025 •

edited

Loading

Adjust default parameters to make even small samples do something #628

Adjust default parameters to make even small samples do something #628

Comments

joanise commented Jan 27, 2025 • edited Loading

Bug description

How to reproduce the bug

Error messages and logs

joanise commented Jan 27, 2025 •

edited

Loading