-
Notifications
You must be signed in to change notification settings - Fork 33
ExperimentDesign
The OpenMalaria simulator is designed for simulating individual scenarios. To simulate a study covering variations in several factors, we often design a fully factorial experiment.
Terminology:
- Sweep
- A sweep is one or more covarying factors, and the set of all arms of this sweep.
- Arm
- Each sweep must have one or (usually) more values for each of its factors. An arm is one of these combinations of values — a configuration assigning a value to each factor on the sweep.
- Experiment
- Largely synonymous with a study, an experiment is the complete description of all sweeps and arms used, together with the scenarios generated and results produced. Each scenario is generated by choosing one arm for each sweep in the experiment.
- Full factorial design
- A full factorial experiment is one where all possible scenarios are generated: all combinations of one arm per sweep are used.
- Seed
- Each time a scenario is simulated, the (pseudo-)random number generator must be started from some seed (generally a positive integer); using a different seed will affect the generation of each pseudo-random number and correspondingly may alter the outcome of stochastic decisions. Additionally, in saying that an experiment was run with 5 seeds, we mean that each scenario was run 5 times, each time with a different seed.
We are assembling a collection of many different models or parameterizations designed to take account of model uncertainties.
Since OpenMalaria simulations are stochastic, we usually use a number of random seeds (say 2-50) to provide estimates of the contributions of random variation to the results.
In order to facilitate the creation of a set of scenarios from an experiment description we developed a tool called "experiment creator" (at the moment it's fairly minimalist). It's a tool that helps you to combine scenarios creating a full factorial experiment design. This takes a description of sweeps and arms from a bunch of files and folders, described below.
For instance, if you want to test two levels of IRSdeterrency, 2 levels of ITN coverage (distributed in two mass campaigns), and 14 different models, for a full factorial design you need 2*2*14=56 scenarios, thus 56 unique .xml files, one for each possible combination of all factors. If you also want to run them with 10 different seeds each, you need 560 xml files. A manual way of creating these 56 files would be to create a base.xml file with the baseline values for each parameter, copy this file 55 times, give it unique names, and adjust the values (and models) manually in each .xml file. With experiment creator, this can be done in an automated way.
As describe before the experiment creator takes a description of sweeps and arms from a bunch of files and folders. The folders and files has to be structured as follow :
Figure 1 : Experiment Folder structure
The "description" sub-directory should contain:
-
base.xml
, a complete scenario describing all interventions, etc., with a parameter for each factor (preferably a reference parameter) -
scenario_VV.xsd
, a schema description - a sub-directory for each sweep
- a sub-directory for the models/parameterizations
The base.xml
must have the same model parameters as one of the model parameterizations — it may be sensible to use one of these as a template for the design of the base scenario. All scenario files must use the same schema version, and this schema file is required for validity checking. Each sweep should consist of a directory containing XML or TXT files, a file is considered as an arm. A directory should contains either XML files or TXT files but could not contains both XML and TXT files.
Each XML sweep should consist of a directory containing one XML file for each arm; each of these files should itself be a complete and valid OpenMalaria scenario. If, for example, an experiment has two sweeps, A and B, and each sweep has two arms, A1 and A2, and B1 and B2, and the parameters of A1 and B1 are used in the base.xml
scenario, then the description should contain directories A
and B
. A
should contain A1.xml
and A2.xml
files, where A1.xml
is identical to base.xml
(except that you may wish to give it a name by changing the name
attribute of the scenario
element), and A2.xml
differs from A1.xml
only in the changed parameters of this sweep. Similarly B
should contain B1.xml
and B2.xml
, each of which would use the A1 parameters of the first sweep together with the B1 or B2 parameters for the second sweep. If you wish to specify which arm is the reference rather than let the experiment creator choose any of them, rename one arm (in each sweep) to reference.xml
. None of the arms should be named base.xml
(see below).
A simple example experiment description: example-28-5-day.7z
(This is a 7-zip archive. Right-click on "View raw file" and choose "save as ...".)
It happens that's for a sweep only one value of a xml parameter has to vary. In this case, it will be most appropriate to use TXT sweep instead of XML sweep. This type of sweep avoids the need to copy/paste the xml file for each arm. The value which has to vary in the basel.xml file is replaced by a variable name and the sweep directory contains txt files. The variable name in the base.xml must be defined using the following naming convention @variable_name@ ( e.g. @IIR@), each variable is unique. Each TXT file in the sweep represents an arm and contains the variable name and value for the given arms. In the file, the syntax for the declaration is : @variable_name@:variable_value ( e.g. @IIR@:7.2 ).
It is possible to include complex XML blocks in a TXT sweep, providing these do not contain carriage-returns. Nested substitutions using the @ substitutions are possible, but you may need to experiment with the collation order of folders.
In the xml, propInfected must take different values. A sweep named infected is created with two different arms.
base.xml:
....
<vector>
<anopheles mosquito="gambiae_ss" propInfected="@population_infect@" propInfectious="0.015">
<monthlyEir annualEIR="0.32">
<item>0.076</item>
...
In the infected sweep :
infected_low.txt:
@population_infect@:0.061
infected_high.txt:
@population_infect@:0.18
The models/parameterizations are added in a similar way to sweeps. Download files for the appropriate time-step and schema version from the repository and extract (requires 7-zip). Rename the extracted folder "models" if you like and put it in the experiment description directory. Now find out which of these models most resembles your base.xml
file (the parameters at the end must be identical), and rename this to base.xml
(still within the models
directory). This last step is important: it tells the experiment creator that it shouldn't carry over all the differences between these model arm XML files and your base.xml
file (in the above directory) — which would remove all the interventions, etc., that you added into your base.xml
file — but, instead, only carry over the differences between the model arm (models/R0000.xml
, etc.) and the base model (the base.xml
file within the models
directory).
After successfully running the experiment creator, the experiment's folder will contain a new folder scenarios and a new csv file scenarios.csv.
The folder scenarios contains xml scenarios resulting from the full factorial generation and named wu_[experiment_name]_[scenario_id].xml
. [experiment_name]
is the name of the experiment you defined before executing experiment creator and [scenario_id]
is a unique id automatically generated during the experiment creator which give to the experiment a unique name.
In the scenarios.csv file you'll find a description of the sweep combination for each scenarios. The first column is the name of the xml scenario, the last column is the seed number and in between you'll find a column per sweep with the arm name applied to the scenario. Exemple :
file,Sweep A,Sweep B,Sweep C,seed
wuExperiment_0.xml,A1,B1,C1,1
wuExperiment_1.xml,A1,B1,C2,1
wuExperiment_2.xml,A1,B2,C1,1
wuExperiment_3.xml,A1,B2,C2,1
wuExperiment_4.xml,A2,B1,C1,1
wuExperiment_5.xml,A2,B1,C2,1
....
Vecnet provided us a completely new experiment creator. You can download it here
The experiment creator (old version) can be used from OpenMalariaTools.
Alternatively, the experiment creator can be downloaded as a jar here: experiment_creator.jar
To run this, you need to install a java run-time environment (JRE) if you don't have it already — install from your package manager or from java.com. Then, open up a command-line, navigate to the saved jar file, and enter:
java -jar ExperimentCreator.jar
Running the experiment creator without any command-line arguments shows some usage instructions:
> java -jar ExperimentCreator.jar
Required arguments: --stddirs PATH
Usage:
CombineSweeps --stddirs PATH [options]
Options:
--stddirs PATH Assume a standard setup: input dir is PATH/description,
output dir is PATH/scenarios, and a list of what the
scenarios are is written to PATH/scenarios.csv
--seeds N Add a sweep of N random seeds
--unique-seeds B If B is true (default), give every scenario a unique seed
--patches Write out arms as patches instead of resulting
combined XML files. (Currently broken.)
--no-validation Turn off validation.
--write-list-only Stop after writing PATH/scenarios.csv without doing DB
updates or generating scenario files.
--read-list Read list in PATH/scenarios.csv and only generate scenario
files listed. Due to limited capability of program, a full list
for the current description should be generated and unwanted
lines deleted; other edits may cause problems.
Comparators of all included scenarios should be included.
Non-DB mode options:
--name NAME Name of experiment; for use when not in DB mode.
--sce-ID-start J Enumerate the output scenarios starting from J instead
of 0 (in DB mode numbers come from DB).
DB-mode options:
--db jdbc:mysql://SERVER:3306/DATABASE
Enable DB mode: read and update DATABASE at address SERVER.
--dbuser USER Log in as USER. Password will be read from command prompt.
--desc DESC Enter a description for database update.
PATH/description should contain one XML file named base.xml and a set of
sub-directories. Each sub-directory containing any XML files is
considered a sweep. Each XML file within each sweep's directory is
considered an arm. See comment at the top of CombineSweeps.java
for more information.
Usually I create a directory for the experiment and put the description of sweeps and arms in a sub-directory called "description"; then I can just run the experiment creator like:
java -jar ExperimentCreator.jar --stddirs path/to/experiment --seeds 10 --name exp-name
(Note: we prefer not to have underscores (_
) in the experiment name, and it must not have spaces.)
You're now set to generate your experiment scenarios, using the experiment creator from a command line as above.
If you want to create test versions with a smaller population size, this is quite easy so long as none of your existing sweeps change the population size. Create a new sweep in your description directory (e.g. a sub-dir called "pop-size") and copy the experiment base.xml
into there (but make sure you rename it). Then just change the population size within this arm — you have a one-sweep arm which changes all other files.
Only one sweep may adjust an attribute or an element/element list relative to your base scenario. If this is not the case, you'll get an error message stating that some elements clash — you need to work out which parts exactly clash, and how to redesign your description to avoid this clash. Is changing the arms used in the base enough (note that changing the base means you'll probably have to change most of your other files too)? Or maybe you'll have to combine two sweeps into one, creating all combinations of their two arms manually.
| Download openmalaria | Installation instructions | XML Schema Documentation |
XML Schema Version | Program version | master |
develop |
---|---|---|---|
43 | schema-43.0 |
- User Guide
- Compilation Guide
- Developer Guide
- Schema Update Guide
- Scenario Design Guide
- Monitoring Guide
- Changelog
- Schema Documentation
- Human demography
- Levels of transmission
- Parasite dynamics within humans
- P vivax dynamics
- Vector bionomics and transmission to humans
- Mosquito population dynamics
- Clinical (illness) models
- Time in the models