This page is intended to provide teams with all the information they need to submit forecasts. These instructions have been adapted from the COVID-19 Forecast Hub.
All forecasts should be submitted directly to the data-forecasts/ folder. Data in this directory should be added to the repository through a pull request so that automatic data validation checks are run.
These instructions provide detail about the data format as well as validation that you can do prior to this pull request. In addition, we describe metadata that each model should provide.
See the data-surveillance/ folder for details on the reported WNV neuroinvasive case data.
Table of Contents
- Data formatting for submission
- Forecast file format
- Making a submission
- Forecast data validation
- Policy on late submissions
The automatic checks in place for forecast files submitted to this repository validates both the filename and file contents to ensure the file can be used in the visualization and ensemble forecasting.
Each subdirectory within the data-forecasts/ directory has the format
team-model
where
team
is the name of your team andmodel
is the name of your model.
Both team and model should be less than 15 characters and not include
hyphens. The model
should be unique from any other model in the project.
Within each subdirectory, there should be a metadata file, a license file (optional), and a set of forecasts.
The metadata file should have the following format
metadata-team-model.txt
and here is the structure of the metadata file.
By default, forecasts are released under a CC-BY 4.0 license. If you would like to release your forecasts under a different license, please specify a standard
license in the license
field of your metadata file. Alternatively, if you wish to use a license that is not in the list of standard
licenses, you may include a
LICENSE.txt
file in your model directory.
Each forecast file within the subdirectory should have the following format
YYYY-MM-DD-team-model.csv
where
YYYY
is the 4 digit year,MM
is the 2 digit month,DD
is the 2 digit day,team
is the name of your team, andmodel
is the name of your model.
The date YYYY-MM-DD is the forecast_date
. For this project, the forecast_date
should always
be the date that the submission is due.
The team
and model
in this file must match the team
and model
in
the directory this file is in. Both team
and model
should be less
than 15 characters, alpha-numeric and underscores only, with no spaces
or hyphens.
The file must be a comma-separated value (csv) file with the following columns (in any order):
forecast_date
target
target_end_date
location
type
quantile
value
No additional columns are allowed.
Each row in the file is a single quantile forecast for a specific location. See the template for an example.
Values in the forecast_date
column must be a date in the format
YYYY-MM-DD
This is the date on which the forecasts were due to be submitted. forecast_date
should correspond
and be redundant with the date in the filename, and is included here for internal completeness.
Values in the target
column must be the following character (string):
Annual WNV neuroinvasive disease cases
The total number of West Nile virus (WNV) neuroinvasive disease cases (confirmed and probable following the WNV neuroinvasive disease case definition) reported to ArboNET from each county in the contiguous United States in 2022.
Values in the target_end_date
column should all be the following date:
2022-12-31
This is the date of the end of the forecast period, the last day of 2022.
Values in the location
column consist of the “State” and “County” as written with a hyphen: “State-County”. For example,
“California-San Diego” or “Texas-Harris”. Do not include the word “County” and include spaces between words
within the county or state name. The easiest way is to accomplish this and ensure that all forecasted locations match the expected forecast locations
is by matching the format in the location file.
Values in the type
column should all be the following string: "quantile".
Values in the quantile
column are in the format
0.###
This value indicates the quantile for the value
in this row.
Teams must provide the following 23 quantiles:
c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)
## [1] 0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500
## [13] 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990
Values in the value
column are non-negative real numbers indicating the
“quantile” prediction for this row. This is the inverse of the cumulative distribution function for
the target
, location
, and quantile
associated with that row.
To prepare for the initial submission, fork this repository and clone it to your computer/work station/etc. In the forked repository you created, make a subdirectory for your team in the data-forecasts/ folder following the subdirectory naming convention. This is where you will place all your forecasts, metadata, and optional license files.
Use a pull request to create your submission. Open a pull request from your forked repository to the original repo. This will initiate merging your changes into the main repo. With the pull request, automatic data validation checks on file format and content are run. More information on making a pull request can be found here.
The initial submission should include the forecast for April 30 and the metadata file describing the model. An optional license file can also be included. Note, validations will fail if there are other commits than just these files in the pull request. Teams are encouraged to submit early for the initial submission to work out the kinks of pull requests and validations. Submissions can be updated at any point prior to the submission deadline. Note that if you submit more than a day before the first submission deadline (April 30, 2022), the automatic validations will flag the submission, but this is not a problem assuming the rest of the checks pass successfully.
When a pull request is open, you can add/modify files in the pull request by pushing changes from your forked repo. This will allow you to address any problems found during the validation checks. Automatic checks run after each push so you can check if you were able to resolve the problems listed.
Common reasons for a failed pull request: Excel changing the date format upon saving the .csv, misspelled column headers or keys in the metadata
We will merge in open pull requests after each submission deadline.
Forecast submissions for the optional May, June, and July deadlines as well as updated metadata can be made through pull requests as well. Those submission should use the respective submission deadline in the file names and be placed in the same team-model subdirectory as the prior submissions.
For additional submissions, indicate the modifications to the model and/or data under the methods_long
variable
in the metadata file.
To ensure proper data formatting, automatic validations are run on all pull requests to
data-forecasts/
.
When a pull request is submitted, the data are validated through Github Actions which runs the tests present in the validations repository. The intent for these tests are to validate the requirements above. Please let us know if you are facing issues while running the tests.
In order to ensure that forecasting is done in real-time, all forecasts are required to be submitted to this repository by the listed deadlines. We do not accept late forecasts.