-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test data pools #108
base: master
Are you sure you want to change the base?
Test data pools #108
Conversation
…ning without pointing to sqlite file, and expand tests accordingly
… dump; also change relative import
@cehbrecht I fixed the failing test and now the checks are passing. |
@alaniwi Thanks for the PR :) Maybe add a little README.md in this test folder with the above description? Is it ok to do a sqash-merge? |
Some tests may be currently failing because it has to download test data from CEDA and this is unavailable due to maintenance (see "connection timed out" messages in the test results). Hopefully the tests should start to pass again some time week -- they were working and all I did was add a README so there is no other reason why they should have broken now. But the readthedoc test failed quickly (maybe the README.md is relevant here?) and I don't know how to fix that, so any advice would be good, please, @cehbrecht . By the way, a squash merge sounds fine - once you are happy to go ahead with the merge. |
@alaniwi how about adding a tag to the data-pool tests? There is already a tag "online" commonly used. But we could add an additional tag, like "data", "data-pool", ... ??? Using the tags we can filter the tests. Example in rook: |
This pull request adds the following:
A command called
data-pools-checks
(implemented inrun_data_pools_checks.py
) that runs the subsetter on a number of test cases in order to try out a variety of different types of datasets (including for example some on curvilinear grids). This will randomly select the bounds for the subsets, although there is a command line option to set a random seed (for example--seed 0
) to give repeatable results, and optionally this can be combined with a--cache
option to cache the subsetter output (under/tmp
) rather than rerunning the subsetter every time the script is run. Results from the checks (containing the name of the collection, the subsetting ranges used, and the test success value) are written initially into an sqlite database and then (periodically and also on exit) these are moved into a compressed CSV file.A command
merge-test-logs
(implemented inmerge_csv.py
) will merge the output logs from the above tester (as obtained from different sites) into a single compressed CSV file. Thedata-pools-checks
command takes an argument which is the site (e.g.DKRZ
) and this is written both into the contents of the output.csv.gz
file (a column called "test location") and also its filename, so the merge command will take a number of these files, and merge them into the specified output file, removing any duplicates.Also a file is included with some unit tests (
test_results_db.py
) to accompany theResultsDB
class (inresults_db.py
) that is used to implement how test results are stored.