feat: write a regression test suite that can be sbatch submitted #616

joanise · 2024-12-20T20:56:41Z

Starting to address #611

PR Goal?

Start to write a regression test suite that we can sbatch submit to the cluster with GPUs and whatever memory we need.

For now, I'm using a tiny dataset, but it's easy to adjust to something larger.

Feedback sought?

While this is early, you can already try to give it a spin: edit the two user config variables at the top if needed, and run

sbatch EveryVoice/everyvoice/tests/regression-test.sh

Priority?

beta I guess

Tests added?

nothing but

Confidence?

medium, this still needs more work

Version change?

no

Related PRs?

no

TODO

use coverage analysis
add at least minimal checks to make sure the output files are as expected
run the demo and use playwright to click in it
add other variants -- there are lots of paths we can take through the EV pipeline!

semanticdiff-com · 2024-12-20T20:56:44Z

Review changes with

Changed Files

File	Status
everyvoice/tests/test_cli.py	47% smaller
.github/workflows/test.yml	Unsupported file format
README.md	Unsupported file format
everyvoice/tests/regression/.gitignore	Unsupported file format
everyvoice/tests/regression/README.md	Unsupported file format
everyvoice/tests/regression/combine-coverage.sh	Unsupported file format
everyvoice/tests/regression/go.sh	Unsupported file format
everyvoice/tests/regression/prep-datasets.sh	Unsupported file format
everyvoice/tests/regression/regression-test.sh	Unsupported file format
everyvoice/tests/regression/wizard-resume-lj	Unsupported file format
everyvoice/tests/regression/wizard-resume-si	Unsupported file format
everyvoice/tests/regression/wizard-resume-xh	Unsupported file format
pyproject.toml	Unsupported file format

github-actions · 2024-12-20T21:01:00Z

CLI load time: 0:00.26
Pull Request HEAD: 0f10fe797cdf6dbef45f346b2b331ed45fdf76a4
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:      1027 |     104547 |     typer.main
import time:       299 |     123533 |   typer
import time:      9764 |     204642 | everyvoice.cli

codecov · 2024-12-20T21:13:27Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.34%. Comparing base (9a210df) to head (0f10fe7).
Report is 10 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #616   +/-   ##
=======================================
  Coverage   76.34%   76.34%           
=======================================
  Files          47       47           
  Lines        3483     3483           
  Branches      479      479           
=======================================
  Hits         2659     2659           
  Misses        721      721           
  Partials      103      103

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

marctessier · 2025-01-09T19:05:51Z

This test is hardcoded for GPSC7. ( GPSC7 was busy so I modified for V100 , GPSC5) .

I ran the test

I did get this message in my error_log , where I did not need... ( I can ignore this below..)

cat EV-regress.e5091562
/var/lib/slurm-llnl/slurmd/job5091562/slurm_script: line 22: /home/tes001/start_ev.sh: No such file or directory

BUT I am getting this message:

Your resume-from file is likely out of sync. Aborting.

Can we add this to the script where the error and output log will be written where the script was ran using the jobname "EV-regress" .

#SBATCH --output=./%x.o%j
#SBATCH --error=./%x.e%j

(EveryVoice_pr616) [U20-GPSC5]:$ cat EV-regress.o5091580
╭──────────────────────────────────────────────────────────────────────────────╮
│ Welcome to the EveryVoice Wizard. We will guide you through the process of   │
│ setting up the configuration for a new EveryVoice project.                   │
│                                                                              │
│ Navigation: as any point, you can hit Ctrl+C to: go back a step, view        │
│ progress, save progress, or exit the wizard.                                 │
│                                                                              │
│ From saved progress, you can resume at any time by running the same command  │
│ with the --resume-from option.                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
Applying saved response 'regress' for Name Step
Great! Launching Configuration Wizard 🧙 for project named 'regress'.
Applying saved response 'EveryVoice Regressor' for Contact Name Step
Great! Nice to meet you, 'EveryVoice Regressor'.
Applying saved response '[email protected]' for Contact Email Step
Great! Your contact email '[email protected]' will be saved to your models.
Applying saved response '.' for Output Path Step
The Configuration Wizard 🧙 will put your files here: 'regress'
Applying saved response 'metadata.csv' for Filelist Step
Applying saved response 'Yes, I do have permission to use this data.' for 
Dataset Permission Step
Applying saved response 'psv' for Filelist Format Step
Error: next tour question is Filelist Text Representation Step but resume list 
has Filelist Has Header Line Step instead.
Your resume-from file is likely out of sync. Aborting.

joanise · 2025-01-09T19:11:56Z

Nice, thank you for this output, I'll take a look when I resume work on this.

joanise · 2025-01-09T22:57:34Z

@marctessier I have fixed the issues you noticed, and more:

I was assuming the metadata file did not have a header, but you added one in your copy of LJ. I added code to skip the first line so my assumptions are met.
I now ignore the activate script if it's not found
I've reduce the tqdm logging frequency to every >=5s with export TQDM_MININTERVAL=5. Makes batch job logs nicer to read. (Still looking for the equivalent with torch, to make it show the loss on screen less often.)
I've merge stderr and stdout into a file in the current directory as you suggested.
I now log each command that gets run before it starts, so it's easier to follow the logs afterwards.

Defining `export TQDM_MININTERVAL=5` (or the desired number of seconds) will reduce the frequency at which TQDM updates progress bars, which is nicer for logs that get saved to file.

Starting to address #611

joanise · 2025-01-24T22:59:06Z

Feedback from Samuel:

fetching bundle.js from unpkg fails if the cluster proxy is not defined; needs graceful fall back like readalongs CLI
make sure timing out trying to fetch from unpkg does mean a lot of time waiting for it
LJ-150 fails to train with everyvoice.utils:filter_dataset_based_on_target_text_representation_level:96 - Sorry you do not have enough characters data in your current validation filelist to run the model with a batch size of 16.
When training fails, exit right away, don't continue to do stuff that's guaranteed to fail
in the readme.md, document that you can use sbatch --partition=... --account=... ../../myscript.sh to override the partition and account that are hard-coded in the scripts

- max step is now 1000 for all cases, for fast regression - max epochs also 10, because for very small cases 1000 steps is pretty slow, possibly due to what happens between epochs? (lj-160 did 111 epochs...) - lj-150 is now lj-160, fixing an error with a validation set with fewer than 16 elements. - document how to override cluster parameters - exit early if training fails Still to do: handle failing to fetch bundle.js when there is not proxy set on the job nodes. But that has to be fixed in ReadAlong/Studio, not here.

roedoejet

This looks great @joanise ! I would just say there are a couple small changes that would make it easier to run on different clusters, see my comments. thanks!

roedoejet · 2025-01-28T17:27:21Z

everyvoice/tests/regression/go.sh

+
+#SBATCH --job-name=EV-r-main
+#SBATCH --partition=standard
+#SBATCH --account=nrc_ict


Maybe we can do what @SamuelLarkin does and provide the --account and --partition for all clusters.

# On GPSC5 #SBATCH --partition=standard #SBATCH --account=nrc_ict # On GPSC7 ##SBATCH --partition=gpu_a100 ##SBATCH --account=nrc_ict__gpu_a100 # On GPSC-C ##SBATCH --partition=gpu_v100 ##SBATCH --account=nrc_ict__gpu_v100 # On Trixie ##SBATCH --partition=TrixieMain,JobTesting ##SBATCH --account=dt-mtp

(I just pulled this from one of the sample scripts that @SamuelLarkin sent - I'm not actually sure if these are the partitions we need)

roedoejet · 2025-01-28T17:29:16Z

everyvoice/tests/regression/prep-datasets.sh

+    cp "$EVERYVOICE_REGRESS_ROOT"/wizard-resume-lj "$dir"/wizard-resume
+    cat <<'==EOF==' > "$dir"/test.txt
+This is a test.
+I am an anvil.


😆 ⚒️

roedoejet · 2025-01-28T17:30:29Z

everyvoice/tests/regression/regression-test.sh

+#!/bin/bash
+
+#SBATCH --job-name=EV-regress
+#SBATCH --partition=gpu_a100


same here with the partition/accounts for all clusters

roedoejet · 2025-01-28T17:31:16Z

everyvoice/tests/regression/regression-test.sh

+    --model '$VOCODER'"
+
+# Exercise DeepForceAligner
+# Meh, this appears to be broken...


What's broken the aligner or coverage or? do you remember what the error is?

When I tried to run that command, it failed, but I didn't investigate further yet. To reproduce, just take one of the LJ directories and run it, which is the thing to do eventually, since we do want to add dfa testing here.

roedoejet · 2025-01-28T17:32:04Z

everyvoice/tests/regression/regression-test.sh

+
+
+# 7: spin up the demo
+# everyvoice demo $FS2 $VOCODER &


this is a TODO I imagine? Maybe mark it with TODO inline?

TODO: has a special syntax highlighting with the Python extension, so I like to use that.

Right, that's a good point. Yes, it's a TODO, just like dfa.

roedoejet · 2025-01-28T17:32:21Z

everyvoice/tests/regression/regression-test.sh

+# everyvoice demo $FS2 $VOCODER &
+
+
+# 8: use playwright to synthesize something using the demo


Same here: TODO: 8: use playwright to synthesize something using the demo

SamuelLarkin · 2025-01-28T18:05:34Z

Not an issue but I seeing

Warning: saved progress file is for EveryVoice Wizard version '0.2.0a1', but
this is version '0.3.0'. Proceeding anyway, but be aware that the saved
responses may not be compatible.

when on the latest shouldn't version of EV, shouldn't all the versions be compatible?

joanise · 2025-01-29T15:15:54Z

when on the latest shouldn't version of EV, shouldn't all the versions be compatible?

Oh, that's a really hard target to try to impose. If we add a question to the wizard at some point, old resume files will all be broken. I didn't think of these files as artefacts we're supposed to maintain backwards compatibility for... That's why I issued the warning, though I suppose we could adjust the warnings to more specifically complain only if the version is older than the last time we made a breaking change in the wizard?

joanise · 2025-01-29T15:22:59Z

But actually, maybe now that we've published, we can add our own requirement to auto-convert from published versions, and this regression test suite gives us the scaffolding to test for that.
My proposal, then: I'll find the oldest version of EV from which the resume files are still compatible, and adjust the warning not to say anything from that point forward, and let's set the new requirement that breaking changes to the wizard cannot break this regression suite.
That's going to make life a lot harder should we choose to add or remove a wizard question, but as we're approaching release it's a discipline we need to have.
And I guess if we make a breaking change to the wizard, we could write a filter function that adapts existing resume files, based on their declared version, with defaults for the new questions that make things work like before said breaking change.

joanise requested review from SamuelLarkin, roedoejet and marctessier December 20, 2024 20:56

joanise marked this pull request as draft December 20, 2024 20:56

joanise force-pushed the dev.ej/regression branch from 8d7f6ab to 5f89017 Compare January 9, 2025 22:53

joanise force-pushed the dev.ej/regression branch 3 times, most recently from a06aa56 to a42061e Compare January 16, 2025 22:04

joanise added 2 commits January 23, 2025 11:26

build(deps): require tqdm>=4.66 to support TQDM_MININTERVAL=5 in jobs

cbebf70

Defining `export TQDM_MININTERVAL=5` (or the desired number of seconds) will reduce the frequency at which TQDM updates progress bars, which is nicer for logs that get saved to file.

feat: write a regression test suite that can be sbatch submitted

63ce1a2

Starting to address #611

joanise force-pushed the dev.ej/regression branch from f72019e to 63ce1a2 Compare January 23, 2025 17:17

joanise marked this pull request as ready for review January 23, 2025 17:21

SamuelLarkin added 3 commits January 24, 2025 16:35

feat: added missing dependency on coverage

cdc7cd2

docs: we no longer have requirements.dev.txt

194551f

feat: Samuel's PR suggestion

5651ada

joanise mentioned this pull request Jan 27, 2025

Adjust default parameters to make even small samples do something #628

Open

roedoejet approved these changes Jan 28, 2025

View reviewed changes

joanise added 2 commits January 30, 2025 16:58

fix(tests): clean up temporary file created by unit tests

e08bb59

docs: better TODO syntax in regression-test, and more comments

3c46c4e

joanise force-pushed the dev.ej/regression branch from eb59e09 to c934e07 Compare January 31, 2025 21:20

ci: fix sudden new CI error, with a missing apt-get update

0f10fe7

joanise force-pushed the dev.ej/regression branch from c934e07 to 0f10fe7 Compare January 31, 2025 21:23

joanise merged commit e13c533 into main Jan 31, 2025
6 checks passed

joanise deleted the dev.ej/regression branch January 31, 2025 21:30

joanise mentioned this pull request Jan 31, 2025

dfaligner train appears to be broken at the moment #623

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: write a regression test suite that can be sbatch submitted #616

feat: write a regression test suite that can be sbatch submitted #616

joanise commented Dec 20, 2024 •

edited

Loading

semanticdiff-com bot commented Dec 20, 2024 •

edited

Loading

github-actions bot commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

marctessier commented Jan 9, 2025 •

edited by joanise

Loading

joanise commented Jan 9, 2025

joanise commented Jan 9, 2025

joanise commented Jan 24, 2025 •

edited

Loading

roedoejet left a comment

roedoejet Jan 28, 2025

roedoejet Jan 28, 2025

roedoejet Jan 28, 2025

roedoejet Jan 28, 2025

roedoejet Jan 28, 2025

joanise Jan 29, 2025

roedoejet Jan 28, 2025

joanise Jan 29, 2025

roedoejet Jan 28, 2025

SamuelLarkin commented Jan 28, 2025

joanise commented Jan 29, 2025

joanise commented Jan 29, 2025

		# everyvoice demo $FS2 $VOCODER &


		# 8: use playwright to synthesize something using the demo

feat: write a regression test suite that can be sbatch submitted #616

feat: write a regression test suite that can be sbatch submitted #616

Conversation

joanise commented Dec 20, 2024 • edited Loading

PR Goal?

Feedback sought?

Priority?

Tests added?

Confidence?

Version change?

Related PRs?

TODO

semanticdiff-com bot commented Dec 20, 2024 • edited Loading

github-actions bot commented Dec 20, 2024 • edited Loading

codecov bot commented Dec 20, 2024 • edited Loading

Codecov Report

marctessier commented Jan 9, 2025 • edited by joanise Loading

joanise commented Jan 9, 2025

joanise commented Jan 9, 2025

joanise commented Jan 24, 2025 • edited Loading

roedoejet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamuelLarkin commented Jan 28, 2025

joanise commented Jan 29, 2025

joanise commented Jan 29, 2025

joanise commented Dec 20, 2024 •

edited

Loading

semanticdiff-com bot commented Dec 20, 2024 •

edited

Loading

github-actions bot commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading

marctessier commented Jan 9, 2025 •

edited by joanise

Loading

joanise commented Jan 24, 2025 •

edited

Loading