-
Notifications
You must be signed in to change notification settings - Fork 2
Parameter Data
Please see the example page for specific examples of data extraction of this information. If still in doubt after looking at the wiki and examples, please contact the parameter lead (see sign-up sheet) as a first port of call for questions.
Please note, we are extracting everything as presented in the paper, even if you think it's an error by the author(s). Please mark the paper down in quality assessment and make a note about this in the article form.
Some papers may have a huge level of parameter disaggregation (e.g. age-sex, location) and so we have established different rules to ease the extraction process. For non-location-related disaggregations, please remember the rule of three. If there are three or more disaggregations for a parameter, e.g. Rt values for three or more age groups, extract these as a range and specify that disaggregated data is available and what the parameter is disaggregated by.
Each pathogen has different rules on location, which we state here:
- Marburg:
- Ebola:
- Lassa:
- SARS:
- Zika:
We can get rid of the below sentence once this has been filled out: For Zika, Lassa, and SARS please extract disaggregated values if the disaggregation is by location as much as possible and do not apply the rule of 3 for geographic regions down to admin level 2 (sub-regions) of a country. However, please respect the rule of three for estimates by neighborhood for example.
We are only extracting parameters that are estimated from or fitted to actual data. For transmission models, if it is only a theoretical model and they have just chosen parameters from other studies/randomly, then please don’t extract these.
- Parameter type – this will give you a drop-down of all the parameters we are wanting to extract.
- Parameter(s) from figure only - we are not extracting data from figures. If a parameter is available in figure form only, tick this box. Parameter context and other extractable information may still be available.
- Other than easily accessible xls or csv files, if parameters are reported in a separate, programming language-specific database, e.g. only available in an RData file, do not extract parameter values and tick the "from figure only" box. (Note: update tick box text for future pathogens)
- Parameter value – the value stated in the paper whether this is in free text, a table or a figure caption. Note that we are not extracting anything from the figures themselves or performing any calculations. The exponent refers to the exponent in scientific notation, i.e. 10^x, where the default is 0 (10^0 = 1). Note that this also applies to the parameter range boxes and parameter uncertainty section below.
- Parameter range - The lower and upper values here correspond to the minimum and maximum values of the parameter across any dimension of disaggregation. For example, if the CFR is disaggregated by age and occupation, the lower value may be for a particular age group while the upper value might be for a particular occupation. Please note that the range also features in the paired parameter uncertainty dropdown list (see following section). This pertains to the range of the central parameter estimate, if available. Note that the exponent input above applies to the parameter range also. If there are 3 or more groups, then extract only the range. For up to 2 groups, extract individual values. Please refer to the paragraph at the top of this page for pathogen-specific rules on this threshold of 3.
- Parameter value reported as inverse - Tick this box if the inverse of a parameter of interest is reported instead of the parameter itself, e.g. if a (fitted) recovery rate is reported instead of the infectious period. Note that this also applies to the parameter range provided above.
- Unit – per week, percent, days, etc. Point estimate and percentage used to be separate in the previous iteration. Now if you wanted to extract 73% you would put 73 in the parameter value field and then choose percent as your unit. If the inverse of a parameter is provided (as described above), please select the units of the parameter itself, e.g. if a (fitted) recovery rate is provided instead of the infectious period, tick 'Parameter value provided as inverse' and select days (or reported time units) as the unit.
- Parameter value type – mean, median, standard deviation (see example 1 for clarity here). Please note that it may be the case that multiple measures of central tendency (or variability - see following section) are provided, especially when the entire distribution of a parameter is presented. In this case, we follow a procedure: if the standard deviation or variance is available, enter the mean for the parameter value type (as well as sd/variance) and the range if available; if IQR is the only available variability metric, enter the median for the central tendency parameter value type (as well as IQR), if a confidence interval is available, enter the mean for parameter value type + CI, if the credible interval is available, enter the median for the parameter value type + CrI, and for both of these situations also enter the range if available. If uncertainty is reported for all measures of central tendency, we have a preference for the mean. This is to avoid extracting multiple measures of centrality and variability for the same parameter and to avoid bias. However, for the Weibull distribution, we prefer shape/scale instead of mean/95% CI or median/95% CrI because we can get mean/CI from shape/scale analytically but can only get shape/scale from mean/CI numerically.
- From supplement – tick this box if the values of the parameters are found in the supplementary material. This will make things a lot easier when we want to go back and find this information again if we know it isn’t in the full text.
- Single-type uncertainty is if only a standard deviation or coefficient of variation, for example, is reported rather than a range of values.
- Paired uncertainty is the option you will be using most of the time -- this includes confidence and credible intervals.
- Distribution type – use this when a study states that the uncertainty around your value x follows some distribution (see example 2)
- Note - If only the parameter range is available, e.g. Rt 1.5-2.3, then don’t extract uncertainty associated with individual estimates in that range. But if there is a central value, then extract uncertainty interval.
- Also - Please note that the exponent input above also applies to the parameter uncertainty section.
- Disaggregated data available – tick this box if you can find the parameter disaggregated by age groups, occupation etc. **Note that we will only be extracting the aggregated value here. **
- 'Method' disaggregation: please note that this includes both the choice of model but also any sensitivity analyses that involves varying model parameters given a particular model choice.
- Sex – the sex composition of your study population. If you have 99 men and 1 woman you would still put both in this option.
- Sample size – number of participants/samples tested etc.
- Setting – how was the study conducted?
- Group – demographic i.e. who was sampled?
- Age min and age max – these must be number fields. If your sample is people over 18 you would put age min = 18 and leave age max blank. Please do not try and insert things like “18+” as this will make things much harder in post-processing.
- Country – where was the study undertaken?
- Location reported i.e. Kerry Town Ebola Treatment Centre
- Start and end dates report the dates of the study – not the outbreak.
- Timing – when in the outbreak was this study undertaken? If it is a serological study before and after an outbreak, extract the seroprevalence separately for each serology survey (see example 3).
This section extracts detail regarding parameters estimated from pathogen genetic sequences. If no parameters were derived from genetic sequences, then this section can be skipped even if sequencing was performed and reported.
- Substitution rate, evolutionary rate, and mutation rate are different ways of describing the speed at which genetic changes accumulate in a population. When selecting the parameter value type, choose the value type and units based on the wording used by the authors in the article. If there are multiple terms used for the same measure (eg, substitution rate is used in the text, evolutionary rate is used in the table), choose either the most frequently used term or default to substitution rate (if the units are substitutions per site per year). These values are often in the supplemental material. So if genetic sequences or phylogenetic analyses are presented, check the supplement. We are not extracting parameters associated with selection pressure or synonymous/nonsynonymous mutations, unless based on data or methodological limitations they have only been able to calculate substitution rate from nonsynonymous mutations (in that case specify this in the 'Gene' field, similar to in vitro experiments - see next bullet point). If substitution rates are calculated for subgroups (eg, 'clades,' 'strains,' 'branches', etc), report the global estimate and indicate disaggregated data is available in the Parameter Disaggregation section.
- As always, units are very important for these parameters. The most common unit is substitutions per site per year. If units are not clear or they do not match the available options in the drop-down menu, select 'unspecified.'
- Type the portion of the pathogen’s genome used to estimate any extracted parameters (eg, reproduction number, growth rate, substitution rate). This can be a gene, a gene segment, a codon position, or a more generic description (eg, ‘whole genome’ or ‘intergenic positions’). If parameter values are independently estimated for different portions of the genome, please enter each on a separate parameter value form. If a mutation rate is estimated by in vitro experiments of recombinant variants (for example, measuring the rate of mutation in an inserted gene, such as green fluorescent protein [GFP]), enter the name of the inserted gene used, even though this gene might not be naturally occurring in the virus's genome. In addition, they may measure different types of mutations (SNPs vs indels) during in vitro experiments. If this is the case, enter the type of mutation used to calculate the rate (ex. GFP-SNP, to signify that SNP mutations in the GFP gene were used to calculate the mutation rate).
- Select the newly-available gene data check box if the study sequenced new pathogen isolates and their accession numbers have been provided for retrieval from a public database. If sequences are available, but no parameters of interest were estimated using this data, do not check this box.
We are extracting either the basic reproduction number R0 or the effective reproduction number Re (there are 2 parameter types). In the reproduction number method section on the right please specify what method was used:
- Renewal equations / branching process (includes EpiEstim & Wallinga and Teunis for example - typically gives Re)
- Growth rate (will typically use Wallinga and Lipsitch to convert an estimated growth rate into reproduction number)
- Compartmental model (fitted to data and where the parameters are then converted into a reproduction number)
- Next generation matrix (typically gives R0)
- empirical (e.g. they reconstructed the transmission tree from contact tracing data, then counted secondary cases for each case - gives Re)
- genomic methods (see above section)
- other
If the paper gives a few values of the reproduction number,
- if 2 or less we extract them all
- if more then we only extract extremes using the "parameter range" (e.g. we don't extract every weekly Rt but the min and max across the time series, or we don't extract R for 20 regions but the min and max across regions). If they also give an overall summary (e.g. average over time or space) then this can be put in the "value" field.
These parameters all refer to time intervals in the natural history of infection of the host.
- Generation Time: The generation time is the time interval between infector exposure to infection and infectee exposure to infection. It may be used in reproduction number estimation, but given the difficulties in its observation, it may be replaced by the serial interval (see below).
- Serial Interval: The serial interval is is the time interval between infector symptom onset and infectee symptom onset. It is frequently used in reproduction number estimation, as a substitute for the generation time.
- Latent Period: The latent period is the time interval between exposure to infection and infectiousness. It is sometimes used interchangeably with the incubation period (see below). It may also be referred to as the latency period or the pre-infectious period.
- Incubation Period: The incubation period is the time interval between exposure to infection and symptom onset. It often coincides with the latent period, but may be shorter (symptom onset before infectiousness, e.g. SARS) or longer (infectiousness before symptom onset, e.g. Covid-19). It may also be referred to as the intrinsic incubation period (in the context of vector-borne diseases) or a subclinical infection.
- Infectious Period: The infectious period is the time interval during which the host remains infectious. It directly follows the latent period (see above). It may also be referred to as the infective period, the contagious period, the transmission period or the communicability period.
- Time in Care: The time in care is the time interval between admission to care and discharge from care or death. Unless there is a delay in receiving care, it directly follows the time from symptom to careseeking (see above). It may vary according to health outcome and is typically highly skewed. It may also be referred to as the length of stay (LOS).
- Other Human Delays: Human delays other than the six listed above may also be reported, for example the time from symptom onset to recovery, symptom onset to death, time from seeking care to admission to care etc. In order to record these, please go to the 'Other human delay' section and fill out the start time and end time as described in the article.
The case fatality ratio is the proportion of cases who end up dying of the disease. Note this depends on the case definition used, as the denominator is people identified as "cases". The infection fatality ratio is the proportion of infections who end up dying of the disease (harder to calculate but less context dependant). Some notes:
- We extract either of CFR and IFR, but clearly state which of the two.
- We don't do any calculation ourselves i.e. if a paper quotes number of deaths and number of cases, but not a CFR, we don't extract that
- Please extract the numerator and denominator of the central value of the CFR only, even if disaggregated numerators and denominators are available
- We extract information about the method used to calculate CFR (or IFR), mainly whether it is
-
- a "naive" method, i.e. percentage mortality which computes total deaths divided by total cases (or infections); this is wrong because there may be many cases or infections who do not have final status information, so the naive estimate is typically an underestimate of true CFR (or IFR)
-
- an adjusted method, which somehow accounts for infections or cases with unknown final status (e.g. calculates deaths / (deaths + recoveries) or does something more fancy)
-
- an unknown method
These parameters refer to estimations of seroprevalence in the paper. This may also be referred to as antibody prevalence. These parameters will all be expressed in a proportion or percentage of the population.
When deciding the parameter type, if IgG or IgM is mentioned, then extract using 'Seroprevalence - IgG' or 'Seroprevalence - IgM' as the name. If not, then please extract using 'Seroprevalence - Assay name'. If both antibodies are tested for in the same test, then extract using 'Seroprevalence - Assay name' and leave a note in the Article form.
- IgG: The prevalence of IgG antibodies.
- IgM: The prevalence of IgM antibodies.
- PRNT: PRNT refers to a plaque reduction neutralization test, which is another test for neutralizing antibodies.
- HAI/HI: HAI refers to a hemagglutination inhibition assay, which is another test for neutralizing antibodies in the blood.
- IFA: IFA refers to an immunofluorescence assay, a test to estimate seroprevalence in a population.
- Unspecified: If there is no assay specified, but it is indicated that some people had antibodies, then use this option.
Please extract the numerator and denominator of the central value of the seroprevalence. If only disaggregated numerators and denominators are available, please do not extract any numerator or denominator.
- The 'Method' question under the IFR/CFR/Seroprevalence numerator and denominator section refers only to CFR/IFR, not to seroprevalence.
- Often seroprevalence studies use more than one assay. For example, an initial test using ELISA is conducted but then a neutralisation test is needed to confirm this, for example due to cross-reactivity. Please extract all seroprevalence estimates in the paper ensuring that you select the relevant assay type each time.
The denominator for the neutralisation test should be as reported (for example, but not exclusively, a subset of the samples tested by ELISA).
- However, please do not extract from papers which are estimating the sensitivity or specificity of an assay.
We are extracting general information about risk factors in the included papers. Choose 'Risk Factors' from the Parameter Type drop-down menu, then move to the Risk Factors section below. We are not extracting the values of odds ratios, risk ratios, etc and we are also not extracting information on the direction of the risk factor (i.e. increases or decreases risk) because this requires context that we are not extracting.
- We are extracting both univariate (naive) and multivariate (adjusted) risk factors, even if they're both available.
- If risk factor significance is estimated using multiple methods, tick disaggregated data available by "other" and make a note in the article form. If different methods disagree over whether a risk factor is significant or insignificant, extract the risk factor as both.
- Risk factor outcome: Here, choose the outcome for which the risk factor was evaluated. You can choose multiple options here. It is sometimes difficult to distinguish between an infection risk factor and a serology risk factor, since sometimes infection is determined based on a serological assay, e.g. PRNT, IgM. If the author specifies the outcome, please extract as written. If the author does not specify, e.g. there is just a significant difference in X between group A and group B, then extract PCR test as infection risk factors, and any assays, PRNT, IgM, IgG, HAI, IFA etc as serology risk factors.
- Risk factor name: This is the name of the population group to whom the risk factor applies, e.g. age, occupation, ...
- Risk factor occupation: If you have chosen 'Occupation' in the previous question, choose the occupation(s) that correspond(s) most closely to that described in the paper.
- Risk factor significant: Choose whether the risk factor(s) is/are significant or not.
- Risk factor adjusted: Choose whether the estimates for the risk factors are adjusted or unadjusted.
During extraction, we group the risk factors in the paper by those that are significant/insignificant and those that are adjusted/naive.
This is intended for pathogens (e.g. MERS) where there is both human to human (h2h) and animal to human (a2h) transmission, and aims to capture the relative magnitude of these two routes of infections in humans. One of two parameters can be selected from the drop down menu:
- relative contribution: human to human
- relative contribution: zoonotic to human
We expect these to be proportions or percentages, so if a study estimates 60% of infections in humans to be from h2h infection, you would select "relative contribution: human to human" and enter "60" as parameter value and "percentage" as unit. Or if the study instead reported the opposite, i.e. 40% of infections in humans to be due to infection from animals, in this case you would select "relative contribution: zoonotic to human" and enter "40" as parameter valuer and "percentage" as unit.
The attack rate is the proportion of an at-risk population contracting the disease during a specified time interval. It is often reported as a percentage or rate, e.g. 52 people per 10,000 people. Extract a percentage as you would for any other parameter, but use the exponent box to record the denominator of the rate, e.g. put 52 in the value box and then -4 in the exponent box for the example here.
Please extract attack rates as written in the paper. We distinguish between attack rates (at the general population level) and secondary attack rates, which will be in a specific setting (households, hospital wards, etc).
Overdispersion is usually considered in population-level analysis of heterogeneous systems. The distribution of individual infectiousness around R0 is often highly skewed, especially for diseases such as SARS.
The number of secondary infections caused by each case, Z, is distributed as Z ~ Poisson(v), where v is the individual's reproductive number. If v ~ Gamma(R0,k), this implies that Z ~ NegBinomial(R0,k), where k is the overdispersion parameter. We want to extract this parameter k. This is very straightforward forward for branching process models/papers, and we can capture the parameter and any stated uncertainty. This is more involved for compartmental models, and a larger number of parameters + context needs to be captured; see the Data Extraction page for examples.
Some papers compute the overdispersion k for sub-periods (for example, to assess the effectiveness of interventions). In those cases, the particular k corresponds to the Re of that period.
We only extract the overdispersion parameter k if this comes from a negative binomial distribution. In the parameter extraction form, we also have the option to record the Max a number of cases super spreading (related to the case); please continue to capture this data.