-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: write a regression test suite that can be sbatch submitted #616
Conversation
Changed Files
|
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #616 +/- ##
=======================================
Coverage 76.34% 76.34%
=======================================
Files 47 47
Lines 3483 3483
Branches 479 479
=======================================
Hits 2659 2659
Misses 721 721
Partials 103 103 ☔ View full report in Codecov by Sentry. |
This test is hardcoded for GPSC7. ( GPSC7 was busy so I modified for V100 , GPSC5) . I ran the test I did get this message in my error_log , where I did not need... ( I can ignore this below..)
BUT I am getting this message:
Can we add this to the script where the error and output log will be written where the script was ran using the jobname "EV-regress" .
|
Nice, thank you for this output, I'll take a look when I resume work on this. |
8d7f6ab
to
5f89017
Compare
@marctessier I have fixed the issues you noticed, and more:
|
a06aa56
to
a42061e
Compare
Defining `export TQDM_MININTERVAL=5` (or the desired number of seconds) will reduce the frequency at which TQDM updates progress bars, which is nicer for logs that get saved to file.
Starting to address #611
f72019e
to
63ce1a2
Compare
Feedback from Samuel:
|
- max step is now 1000 for all cases, for fast regression - max epochs also 10, because for very small cases 1000 steps is pretty slow, possibly due to what happens between epochs? (lj-160 did 111 epochs...) - lj-150 is now lj-160, fixing an error with a validation set with fewer than 16 elements. - document how to override cluster parameters - exit early if training fails Still to do: handle failing to fetch bundle.js when there is not proxy set on the job nodes. But that has to be fixed in ReadAlong/Studio, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @joanise ! I would just say there are a couple small changes that would make it easier to run on different clusters, see my comments. thanks!
|
||
#SBATCH --job-name=EV-r-main | ||
#SBATCH --partition=standard | ||
#SBATCH --account=nrc_ict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do what @SamuelLarkin does and provide the --account and --partition for all clusters.
# On GPSC5
#SBATCH --partition=standard
#SBATCH --account=nrc_ict
# On GPSC7
##SBATCH --partition=gpu_a100
##SBATCH --account=nrc_ict__gpu_a100
# On GPSC-C
##SBATCH --partition=gpu_v100
##SBATCH --account=nrc_ict__gpu_v100
# On Trixie
##SBATCH --partition=TrixieMain,JobTesting
##SBATCH --account=dt-mtp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I just pulled this from one of the sample scripts that @SamuelLarkin sent - I'm not actually sure if these are the partitions we need)
cp "$EVERYVOICE_REGRESS_ROOT"/wizard-resume-lj "$dir"/wizard-resume | ||
cat <<'==EOF==' > "$dir"/test.txt | ||
This is a test. | ||
I am an anvil. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆 ⚒️
#!/bin/bash | ||
|
||
#SBATCH --job-name=EV-regress | ||
#SBATCH --partition=gpu_a100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here with the partition/accounts for all clusters
--model '$VOCODER'" | ||
|
||
# Exercise DeepForceAligner | ||
# Meh, this appears to be broken... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's broken the aligner or coverage or? do you remember what the error is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tried to run that command, it failed, but I didn't investigate further yet. To reproduce, just take one of the LJ directories and run it, which is the thing to do eventually, since we do want to add dfa testing here.
|
||
|
||
# 7: spin up the demo | ||
# everyvoice demo $FS2 $VOCODER & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a TODO I imagine? Maybe mark it with TODO inline?
TODO:
has a special syntax highlighting with the Python extension, so I like to use that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's a good point. Yes, it's a TODO, just like dfa.
# everyvoice demo $FS2 $VOCODER & | ||
|
||
|
||
# 8: use playwright to synthesize something using the demo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: TODO: 8: use playwright to synthesize something using the demo
Not an issue but I seeing
when on the latest shouldn't version of EV, shouldn't all the versions be compatible? |
Oh, that's a really hard target to try to impose. If we add a question to the wizard at some point, old resume files will all be broken. I didn't think of these files as artefacts we're supposed to maintain backwards compatibility for... That's why I issued the warning, though I suppose we could adjust the warnings to more specifically complain only if the version is older than the last time we made a breaking change in the wizard? |
But actually, maybe now that we've published, we can add our own requirement to auto-convert from published versions, and this regression test suite gives us the scaffolding to test for that. |
eb59e09
to
c934e07
Compare
c934e07
to
0f10fe7
Compare
Starting to address #611
PR Goal?
Start to write a regression test suite that we can sbatch submit to the cluster with GPUs and whatever memory we need.
For now, I'm using a tiny dataset, but it's easy to adjust to something larger.
Feedback sought?
While this is early, you can already try to give it a spin: edit the two user config variables at the top if needed, and run
Priority?
beta I guess
Tests added?
nothing but
Confidence?
medium, this still needs more work
Version change?
no
Related PRs?
no
TODO