Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test data pools #108

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Test data pools #108

wants to merge 13 commits into from

Conversation

alaniwi
Copy link
Collaborator

@alaniwi alaniwi commented Sep 22, 2023

This pull request adds the following:

  • A command called data-pools-checks (implemented in run_data_pools_checks.py) that runs the subsetter on a number of test cases in order to try out a variety of different types of datasets (including for example some on curvilinear grids). This will randomly select the bounds for the subsets, although there is a command line option to set a random seed (for example --seed 0) to give repeatable results, and optionally this can be combined with a --cache option to cache the subsetter output (under /tmp) rather than rerunning the subsetter every time the script is run. Results from the checks (containing the name of the collection, the subsetting ranges used, and the test success value) are written initially into an sqlite database and then (periodically and also on exit) these are moved into a compressed CSV file.

  • A command merge-test-logs (implemented in merge_csv.py) will merge the output logs from the above tester (as obtained from different sites) into a single compressed CSV file. The data-pools-checks command takes an argument which is the site (e.g. DKRZ) and this is written both into the contents of the output .csv.gz file (a column called "test location") and also its filename, so the merge command will take a number of these files, and merge them into the specified output file, removing any duplicates.

  • Also a file is included with some unit tests (test_results_db.py) to accompany the ResultsDB class (in results_db.py) that is used to implement how test results are stored.

@alaniwi
Copy link
Collaborator Author

alaniwi commented Oct 11, 2023

@cehbrecht I fixed the failing test and now the checks are passing.

@cehbrecht cehbrecht self-requested a review October 16, 2023 13:39
@cehbrecht
Copy link
Collaborator

@alaniwi Thanks for the PR :) Maybe add a little README.md in this test folder with the above description? Is it ok to do a sqash-merge?

@alaniwi
Copy link
Collaborator Author

alaniwi commented Oct 25, 2023

Some tests may be currently failing because it has to download test data from CEDA and this is unavailable due to maintenance (see "connection timed out" messages in the test results). Hopefully the tests should start to pass again some time week -- they were working and all I did was add a README so there is no other reason why they should have broken now.

But the readthedoc test failed quickly (maybe the README.md is relevant here?) and I don't know how to fix that, so any advice would be good, please, @cehbrecht .

By the way, a squash merge sounds fine - once you are happy to go ahead with the merge.

@cehbrecht
Copy link
Collaborator

@alaniwi how about adding a tag to the data-pool tests?

There is already a tag "online" commonly used. But we could add an additional tag, like "data", "data-pool", ... ???

Using the tags we can filter the tests.

Example in rook:
https://github.com/roocs/rook/blob/7399fac2f54de3b2b454c219d55a41548905b4f2/tests/smoke/test_smoke_checks.py#L10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants