Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: write a regression test suite that can be sbatch submitted #616

Merged
merged 9 commits into from
Jan 31, 2025

Conversation

joanise
Copy link
Member

@joanise joanise commented Dec 20, 2024

Starting to address #611

PR Goal?

Start to write a regression test suite that we can sbatch submit to the cluster with GPUs and whatever memory we need.

For now, I'm using a tiny dataset, but it's easy to adjust to something larger.

Feedback sought?

While this is early, you can already try to give it a spin: edit the two user config variables at the top if needed, and run

sbatch EveryVoice/everyvoice/tests/regression-test.sh

Priority?

beta I guess

Tests added?

nothing but

Confidence?

medium, this still needs more work

Version change?

no

Related PRs?

no

TODO

  • use coverage analysis
  • add at least minimal checks to make sure the output files are as expected
  • run the demo and use playwright to click in it
  • add other variants -- there are lots of paths we can take through the EV pipeline!

Copy link

semanticdiff-com bot commented Dec 20, 2024

Review changes with  SemanticDiff

Changed Files
File Status
  everyvoice/tests/test_cli.py  47% smaller
  .github/workflows/test.yml Unsupported file format
  README.md Unsupported file format
  everyvoice/tests/regression/.gitignore Unsupported file format
  everyvoice/tests/regression/README.md Unsupported file format
  everyvoice/tests/regression/combine-coverage.sh Unsupported file format
  everyvoice/tests/regression/go.sh Unsupported file format
  everyvoice/tests/regression/prep-datasets.sh Unsupported file format
  everyvoice/tests/regression/regression-test.sh Unsupported file format
  everyvoice/tests/regression/wizard-resume-lj Unsupported file format
  everyvoice/tests/regression/wizard-resume-si Unsupported file format
  everyvoice/tests/regression/wizard-resume-xh Unsupported file format
  pyproject.toml Unsupported file format

@joanise joanise marked this pull request as draft December 20, 2024 20:56
Copy link
Contributor

github-actions bot commented Dec 20, 2024

CLI load time: 0:00.26
Pull Request HEAD: 0f10fe797cdf6dbef45f346b2b331ed45fdf76a4
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:      1027 |     104547 |     typer.main
import time:       299 |     123533 |   typer
import time:      9764 |     204642 | everyvoice.cli

Copy link

codecov bot commented Dec 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.34%. Comparing base (9a210df) to head (0f10fe7).
Report is 10 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #616   +/-   ##
=======================================
  Coverage   76.34%   76.34%           
=======================================
  Files          47       47           
  Lines        3483     3483           
  Branches      479      479           
=======================================
  Hits         2659     2659           
  Misses        721      721           
  Partials      103      103           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marctessier
Copy link
Collaborator

marctessier commented Jan 9, 2025

This test is hardcoded for GPSC7. ( GPSC7 was busy so I modified for V100 , GPSC5) .

I ran the test

I did get this message in my error_log , where I did not need... ( I can ignore this below..)

cat EV-regress.e5091562
/var/lib/slurm-llnl/slurmd/job5091562/slurm_script: line 22: /home/tes001/start_ev.sh: No such file or directory

BUT I am getting this message:

Your resume-from file is likely out of sync. Aborting.

Can we add this to the script where the error and output log will be written where the script was ran using the jobname "EV-regress" .

#SBATCH --output=./%x.o%j
#SBATCH --error=./%x.e%j
(EveryVoice_pr616) [U20-GPSC5]:$ cat EV-regress.o5091580
╭──────────────────────────────────────────────────────────────────────────────╮
│ Welcome to the EveryVoice Wizard. We will guide you through the process of   │
│ setting up the configuration for a new EveryVoice project.                   │
│                                                                              │
│ Navigation: as any point, you can hit Ctrl+C to: go back a step, view        │
│ progress, save progress, or exit the wizard.                                 │
│                                                                              │
│ From saved progress, you can resume at any time by running the same command  │
│ with the --resume-from option.                                               │
╰──────────────────────────────────────────────────────────────────────────────╯
Applying saved response 'regress' for Name Step
Great! Launching Configuration Wizard 🧙 for project named 'regress'.
Applying saved response 'EveryVoice Regressor' for Contact Name Step
Great! Nice to meet you, 'EveryVoice Regressor'.
Applying saved response '[email protected]' for Contact Email Step
Great! Your contact email '[email protected]' will be saved to your models.
Applying saved response '.' for Output Path Step
The Configuration Wizard 🧙 will put your files here: 'regress'
Applying saved response 'metadata.csv' for Filelist Step
Applying saved response 'Yes, I do have permission to use this data.' for 
Dataset Permission Step
Applying saved response 'psv' for Filelist Format Step
Error: next tour question is Filelist Text Representation Step but resume list 
has Filelist Has Header Line Step instead.
Your resume-from file is likely out of sync. Aborting.

@joanise
Copy link
Member Author

joanise commented Jan 9, 2025

Nice, thank you for this output, I'll take a look when I resume work on this.

@joanise joanise force-pushed the dev.ej/regression branch from 8d7f6ab to 5f89017 Compare January 9, 2025 22:53
@joanise
Copy link
Member Author

joanise commented Jan 9, 2025

@marctessier I have fixed the issues you noticed, and more:

  • I was assuming the metadata file did not have a header, but you added one in your copy of LJ. I added code to skip the first line so my assumptions are met.
  • I now ignore the activate script if it's not found
  • I've reduce the tqdm logging frequency to every >=5s with export TQDM_MININTERVAL=5. Makes batch job logs nicer to read. (Still looking for the equivalent with torch, to make it show the loss on screen less often.)
  • I've merge stderr and stdout into a file in the current directory as you suggested.
  • I now log each command that gets run before it starts, so it's easier to follow the logs afterwards.

@joanise joanise force-pushed the dev.ej/regression branch 3 times, most recently from a06aa56 to a42061e Compare January 16, 2025 22:04
Defining `export TQDM_MININTERVAL=5` (or the desired number of seconds) will
reduce the frequency at which TQDM updates progress bars, which is nicer for
logs that get saved to file.
@joanise joanise marked this pull request as ready for review January 23, 2025 17:21
@joanise
Copy link
Member Author

joanise commented Jan 24, 2025

Feedback from Samuel:

  • fetching bundle.js from unpkg fails if the cluster proxy is not defined; needs graceful fall back like readalongs CLI
  • make sure timing out trying to fetch from unpkg does mean a lot of time waiting for it
  • LJ-150 fails to train with everyvoice.utils:filter_dataset_based_on_target_text_representation_level:96 - Sorry you do not have enough characters data in your current validation filelist to run the model with a batch size of 16.
  • When training fails, exit right away, don't continue to do stuff that's guaranteed to fail
  • in the readme.md, document that you can use sbatch --partition=... --account=... ../../myscript.sh to override the partition and account that are hard-coded in the scripts

 - max step is now 1000 for all cases, for fast regression
 - max epochs also 10, because for very small cases 1000 steps is pretty slow,
   possibly due to what happens between epochs? (lj-160 did 111 epochs...)
 - lj-150 is now lj-160, fixing an error with a validation set with fewer than
   16 elements.
 - document how to override cluster parameters
 - exit early if training fails

Still to do: handle failing to fetch bundle.js when there is not proxy set on
the job nodes. But that has to be fixed in ReadAlong/Studio, not here.
Copy link
Member

@roedoejet roedoejet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @joanise ! I would just say there are a couple small changes that would make it easier to run on different clusters, see my comments. thanks!


#SBATCH --job-name=EV-r-main
#SBATCH --partition=standard
#SBATCH --account=nrc_ict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can do what @SamuelLarkin does and provide the --account and --partition for all clusters.

# On GPSC5
#SBATCH --partition=standard
#SBATCH --account=nrc_ict
# On GPSC7
##SBATCH --partition=gpu_a100
##SBATCH --account=nrc_ict__gpu_a100
# On GPSC-C
##SBATCH --partition=gpu_v100
##SBATCH --account=nrc_ict__gpu_v100
# On Trixie
##SBATCH --partition=TrixieMain,JobTesting
##SBATCH --account=dt-mtp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I just pulled this from one of the sample scripts that @SamuelLarkin sent - I'm not actually sure if these are the partitions we need)

cp "$EVERYVOICE_REGRESS_ROOT"/wizard-resume-lj "$dir"/wizard-resume
cat <<'==EOF==' > "$dir"/test.txt
This is a test.
I am an anvil.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 ⚒️

#!/bin/bash

#SBATCH --job-name=EV-regress
#SBATCH --partition=gpu_a100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here with the partition/accounts for all clusters

--model '$VOCODER'"

# Exercise DeepForceAligner
# Meh, this appears to be broken...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's broken the aligner or coverage or? do you remember what the error is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried to run that command, it failed, but I didn't investigate further yet. To reproduce, just take one of the LJ directories and run it, which is the thing to do eventually, since we do want to add dfa testing here.



# 7: spin up the demo
# everyvoice demo $FS2 $VOCODER &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a TODO I imagine? Maybe mark it with TODO inline?

TODO: has a special syntax highlighting with the Python extension, so I like to use that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's a good point. Yes, it's a TODO, just like dfa.

# everyvoice demo $FS2 $VOCODER &


# 8: use playwright to synthesize something using the demo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: TODO: 8: use playwright to synthesize something using the demo

@SamuelLarkin
Copy link
Collaborator

Not an issue but I seeing

Warning: saved progress file is for EveryVoice Wizard version '0.2.0a1', but
this is version '0.3.0'. Proceeding anyway, but be aware that the saved
responses may not be compatible.

when on the latest shouldn't version of EV, shouldn't all the versions be compatible?

@joanise
Copy link
Member Author

joanise commented Jan 29, 2025

when on the latest shouldn't version of EV, shouldn't all the versions be compatible?

Oh, that's a really hard target to try to impose. If we add a question to the wizard at some point, old resume files will all be broken. I didn't think of these files as artefacts we're supposed to maintain backwards compatibility for... That's why I issued the warning, though I suppose we could adjust the warnings to more specifically complain only if the version is older than the last time we made a breaking change in the wizard?

@joanise
Copy link
Member Author

joanise commented Jan 29, 2025

But actually, maybe now that we've published, we can add our own requirement to auto-convert from published versions, and this regression test suite gives us the scaffolding to test for that.
My proposal, then: I'll find the oldest version of EV from which the resume files are still compatible, and adjust the warning not to say anything from that point forward, and let's set the new requirement that breaking changes to the wizard cannot break this regression suite.
That's going to make life a lot harder should we choose to add or remove a wizard question, but as we're approaching release it's a discipline we need to have.
And I guess if we make a breaking change to the wizard, we could write a filter function that adapts existing resume files, based on their declared version, with defaults for the new questions that make things work like before said breaking change.

@joanise joanise merged commit e13c533 into main Jan 31, 2025
6 checks passed
@joanise joanise deleted the dev.ej/regression branch January 31, 2025 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants