Test data pools #108

alaniwi · 2023-09-22T14:04:22Z

This pull request adds the following:

A command called data-pools-checks (implemented in run_data_pools_checks.py) that runs the subsetter on a number of test cases in order to try out a variety of different types of datasets (including for example some on curvilinear grids). This will randomly select the bounds for the subsets, although there is a command line option to set a random seed (for example --seed 0) to give repeatable results, and optionally this can be combined with a --cache option to cache the subsetter output (under /tmp) rather than rerunning the subsetter every time the script is run. Results from the checks (containing the name of the collection, the subsetting ranges used, and the test success value) are written initially into an sqlite database and then (periodically and also on exit) these are moved into a compressed CSV file.
A command merge-test-logs (implemented in merge_csv.py) will merge the output logs from the above tester (as obtained from different sites) into a single compressed CSV file. The data-pools-checks command takes an argument which is the site (e.g. DKRZ) and this is written both into the contents of the output .csv.gz file (a column called "test location") and also its filename, so the merge command will take a number of these files, and merge them into the specified output file, removing any duplicates.
Also a file is included with some unit tests (test_results_db.py) to accompany the ResultsDB class (in results_db.py) that is used to implement how test results are stored.

…ning without pointing to sqlite file, and expand tests accordingly

… dump; also change relative import

alaniwi · 2023-10-11T11:26:54Z

@cehbrecht I fixed the failing test and now the checks are passing.

cehbrecht · 2023-10-16T13:41:40Z

@alaniwi Thanks for the PR :) Maybe add a little README.md in this test folder with the above description? Is it ok to do a sqash-merge?

alaniwi · 2023-10-25T14:10:29Z

Some tests may be currently failing because it has to download test data from CEDA and this is unavailable due to maintenance (see "connection timed out" messages in the test results). Hopefully the tests should start to pass again some time week -- they were working and all I did was add a README so there is no other reason why they should have broken now.

But the readthedoc test failed quickly (maybe the README.md is relevant here?) and I don't know how to fix that, so any advice would be good, please, @cehbrecht .

By the way, a squash merge sounds fine - once you are happy to go ahead with the merge.

cehbrecht · 2023-10-26T16:19:19Z

@alaniwi how about adding a tag to the data-pool tests?

There is already a tag "online" commonly used. But we could add an additional tag, like "data", "data-pool", ... ???

Using the tags we can filter the tests.

Example in rook:
https://github.com/roocs/rook/blob/7399fac2f54de3b2b454c219d55a41548905b4f2/tests/smoke/test_smoke_checks.py#L10

alaniwi added 10 commits May 15, 2023 14:05

data pools tester

3b18ebb

quotes

ed253d9

tidy up a little (remove commented-out code)

beaabb9

add the means to write results to database

1c495d1

tweaks for results_db and add unit tests

dfb401e

field all NaNs check

9de8463

refactor as class and add caching for get_fullfield

7003d26

add cli entry point for data-pools-checks

907fcd8

improve functionality of ResultsDB class to include read/only and ope…

30a1d15

…ning without pointing to sqlite file, and expand tests accordingly

add script to merge csv.gz files containing subset tester logs

52e32b1

alaniwi mentioned this pull request Sep 22, 2023

Plan for extensive unit testing of ESGF data roocs/rook#222

Open

alaniwi added 2 commits October 11, 2023 12:03

Merge remote-tracking branch 'origin/master' into test_data_pools_new

1ad0de0

fix test_results_db.py: change sql test data to use select instead of…

bd67eef

… dump; also change relative import

cehbrecht self-requested a review October 16, 2023 13:39

cehbrecht approved these changes Oct 16, 2023

View reviewed changes

Create README.md

6a45ca2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test data pools #108

Test data pools #108

alaniwi commented Sep 22, 2023

alaniwi commented Oct 11, 2023

cehbrecht commented Oct 16, 2023

alaniwi commented Oct 25, 2023

cehbrecht commented Oct 26, 2023

Test data pools #108

Are you sure you want to change the base?

Test data pools #108

Conversation

alaniwi commented Sep 22, 2023

alaniwi commented Oct 11, 2023

cehbrecht commented Oct 16, 2023

alaniwi commented Oct 25, 2023

cehbrecht commented Oct 26, 2023