Skip to content

Commit

Permalink
clean readme + pass content to intro article
Browse files Browse the repository at this point in the history
  • Loading branch information
avallecam committed Oct 22, 2020
1 parent 19ba2e0 commit 018e5a5
Show file tree
Hide file tree
Showing 7 changed files with 266 additions and 728 deletions.
332 changes: 19 additions & 313 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,311 +45,37 @@ The goal of `serosurvey` is to gather __Serological Survey Analysis__ functions

<!-- You can install the released version of serosurvey from [CRAN](https://CRAN.R-project.org) with: -->

You can install the developmental version of `serosurvey` from
[GitHub](https://github.com/avallecam/serosurvey) with:

``` r
if(!require("remotes")) install.packages("remotes")
remotes::install_github("avallecam/serosurvey")
```

## Example

Three basic examples which shows you how to solve common problems:

```{r example}
library(serosurvey)
```

```{r,echo=FALSE}
# additional
library(tidyverse)
library(srvyr)
library(survey)
library(tictoc)
library(furrr)
library(purrr)
# theme
theme_set(theme_bw())
```

```{r,echo=FALSE}
data(api)
datasurvey <- apiclus2 %>%
mutate(survey_all="survey_all") %>%
# create variables
mutate(outcome_one = awards,
outcome_two = cut(pct.resp,breaks = 2),
covariate_01 = stype,
covariate_02 = both)
```

```{r,echo=FALSE}
# tratamiento de stratos con un solo conglomerado
options(survey.lonely.psu = "certainty")
# uu_clean_data %>% count(CONGLOMERADO,VIVIENDA)
# diseño muestral de la encuesta ---------------------------------
design <- datasurvey %>%
filter(!is.na(outcome_one)) %>% #CRITICAL! ON OUTCOME
filter(!is.na(pw)) %>% #NO DEBEN DE HABER CONGLOMERADOS SIN WEIGHT
as_survey_design(
id=c(dnum, snum), #~dnum+snum, # primary secondary sampling unit
# strata = strata, #clusters need to be nested in the strata
weights = pw # factores de expancion
)
```

```{r,echo=FALSE}
# denominadores
covariate_set01 <- datasurvey %>%
select(covariate_01,
#sch.wide,
#comp.imp,
covariate_02) %>%
colnames()
# numerators within outcome
covariate_set02 <- datasurvey %>%
select(#stype,
#sch.wide,
#comp.imp,
covariate_02) %>%
colnames()
```
## Brief description

### 1. `survey`: Estimate single prevalences
The current workflow is divided in two steps:

- From a [`srvyr`](http://gdfe.co/srvyr/) __survey design object__, __`serosvy_proportion`__ estimates:
1. `survey`: Estimate multiple prevalences, and
2. `serology`: Estimate prevalence Under misclassification for a device
with Known or Unknown test performance

+ weighted prevalence (`prop`),
+ total population (`total`),
+ raw proportion (`raw_prop`),
+ coefficient of variability (`cv`),
+ design effect (`deff`)

```{r}
serosvy_proportion(design = design,
denominator = covariate_01,
numerator = outcome_one)
```

```{r,eval=FALSE}
example("serosvy_proportion")
```

### 2. `survey`: Estimate multiple prevalences
## More

- In the
[Introductory article](https://avallecam.github.io/serosurvey/articles/intro.html)
we provide detailed definitions and references of the methods available.
- In
the [Article tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html)
we provide a workflow to __estimate multiple prevalences__:

+ using different set of covariates and outcomes as numerators or denominators,
+ in one single pipe operation

```{r}
# crear matriz
#
# set 01 of denominator-numerator
#
expand_grid(
design=list(design),
denominator=c("covariate_01","covariate_02"), # covariates
numerator=c("outcome_one","outcome_two") # outcomes
) %>%
#
# set 02 of denominator-numerator (e.g. within main outcome)
#
union_all(
expand_grid(
design=list(design),
denominator=c("outcome_one","outcome_two"), # outcomes
numerator=c("covariate_02") # covariates
)
) %>%
#
# create symbols (to be readed as arguments)
#
mutate(
denominator=map(denominator,dplyr::sym),
numerator=map(numerator,dplyr::sym)
) %>%
#
# estimate prevalence
#
mutate(output=pmap(.l = select(.,design,denominator,numerator),
.f = serosvy_proportion)) %>%
#
# show the outcome
#
select(-design,-denominator,-numerator) %>%
unnest(cols = c(output)) %>%
print(n=Inf)
```

#### `learnr` tutorial

- Learn to build this with in a tutorial in Spanish:

```r
# update package
if(!require("remotes")) install.packages("remotes")
remotes::install_github("avallecam/serosurvey")
# install learner and run tutorial
if(!require("learnr")) install.packages("learnr")
learnr::run_tutorial(name = "taller",package = "serosurvey")
```
the
[Workflow article](https://avallecam.github.io/serosurvey/articles/howto-reprex.html)
we provide a reproducible example with this package.


### 3. `serology`: Estimate prevalence Under misclassification

- We gather __one frequentist approach__ [@ROGAN1978],
available in different Github repos, that deal with
misclassification due to an imperfect diagnostic
test [@Azman2020; @Takahashi2020].
Check the [Reference tab](https://avallecam.github.io/serosurvey/reference/index.html).

- We provide __tidy outputs for bayesian approaches__ developed
in @Larremore2020unk [here](https://github.com/LarremoreLab/bayesian-joint-prev-se-sp/blob/master/singleSERO_uncertainTEST.R)
and @Larremore2020kno [here](https://github.com/LarremoreLab/covid_serological_sampling/blob/master/codebase/seroprevalence.R):

- You can use them with [`purrr`](https://purrr.tidyverse.org/) and [`furrr`](https://davisvaughan.github.io/furrr/) to efficiently iterate
and parallelize this step for __multiple prevalences__.
Check the workflow
in [Article tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html).


#### __Known test performance - Bayesian method__

```{r,eval=FALSE}
serosvy_known_sample_posterior(
#in population
positive_number_test = 321,
total_number_test = 321+1234,
# known performance
sensitivity = 0.93,
specificity = 0.975
)
```

```{r,echo=FALSE}
tidy_result <- serosvy_known_sample_posterior(
#in population
positive_number_test = 321,
total_number_test = 321+1234,
# known performance
sensitivity = 0.93,
specificity = 0.975
)
tidy_result_out <-
tidy_result %>%
select(summary) %>%
unnest(cols = c(summary))
tidy_result %>%
select(posterior) %>%
unnest(cols = c(posterior)) %>%
ggplot(aes(x = r1)) +
geom_histogram(aes(y=..density..),binwidth = 0.0005) +
geom_density() +
geom_vline(aes(xintercept=tidy_result_out %>%
pull(numeric.mean)),
color="red",lwd=1) +
geom_vline(aes(xintercept=tidy_result_out %>%
pull(numeric.p05)),
color="red") +
geom_vline(aes(xintercept=tidy_result_out %>%
pull(numeric.p95)),
color="red") +
scale_x_continuous(breaks = scales::pretty_breaks())
```

```{r,eval=FALSE}
example("serosvy_known_sample_posterior")
```

#### __Unknown test performance - Bayesian method__

- The test performance is called _"unknown"_ or _"uncertain"_ when test
sensitivity and specificity are not known with
certainty [@Kritsotakis2020; @Diggle2011; @Gelman2020] and
lab validation data is available with a limited set of samples,
tipically during a novel pathogen outbreak.

```{r,eval=FALSE,echo=FALSE}
# result_unk <- sample_posterior_r_mcmc_testun(
# samps = 10000,
# #in population
# pos = 692, #positive
# n = 3212, #total
# # in lab (local validation study)
# tp = 670,tn = 640,fp = 202,fn = 74)
```

```{r,eval=FALSE}
serosvy_unknown_sample_posterior_ii(
#in population
positive_number_test = 321,
total_number_test = 321+1234,
# in lab (local validation study)
true_positive = 670,
true_negative = 640,
false_positive = 202,
false_negative = 74)
```

```{r,echo=FALSE}
result_unk <- serosvy_unknown_sample_posterior_ii(
#in population
positive_number_test = 321,
total_number_test = 321+1234,
# in lab (local validation study)
true_positive = 670,
true_negative = 640,
false_positive = 202,
false_negative = 74)
result_unk %>%
select(posterior) %>%
unnest(posterior) %>%
rownames_to_column() %>%
pivot_longer(cols = -rowname,
names_to = "estimates",
values_to = "values") %>%
ggplot(aes(x = values)) +
geom_histogram(aes(y=..density..),binwidth = 0.0005) +
geom_density() +
facet_grid(~estimates,scales = "free_x")
```

```{r,eval=FALSE}
example("serosvy_unknown_sample_posterior")
```

## Contributing

Feel free to fill an issue or contribute with your functions or workflows in a pull request.

Here are a list of publications with interesting approaches using R:

- @Silveira2020 and @Hallal2020 analysed a serological survey accounting for sampling design and test validity using parametric bootstraping, following @Lewis2012.

- @Flor2020 implemented a lot of frequentist and bayesian methods for test with known sensitivity and specificity. Code is available [here](https://github.com/BfRstats/bayespem-validation-code).

- @Gelman2020 also applied Bayesian inference with hierarchical regression
and post-stratification to account for test uncertainty
with unknown specificity and sensitivity.
Here a [case-study](https://github.com/bob-carpenter/diagnostic-testing/blob/master/src/case-study/seroprevalence-meta-analysis.Rmd).

## How to cite this R package

```{r}
citation("serosurvey")
```

## Contact

Andree Valle Campos |
Expand All @@ -364,28 +90,8 @@ Many thanks to the Centro Nacional de Epidemiología, Prevención y Control
de Enfermedades [(CDC Perú)](https://www.dge.gob.pe/portalnuevo/)
for the opportunity to work on this project.

## References

Azman, Andrew S, Stephen Lauer, M. Taufiqur Rahman Bhuiyan, Francisco J Luquero, Daniel T Leung, Sonia Hegde, Jason B Harris, et al. 2020. “Vibrio Cholerae O1 Transmission in Bangladesh: Insights from a Nationally- Representative Serosurvey,” March. https://doi.org/10.1101/2020.03.13.20035352.

Diggle, Peter J. 2011. “Estimating Prevalence Using an Imperfect Test.” Epidemiology Research International 2011: 1–5. https://doi.org/10.1155/2011/608719.

Flor, Matthias, Michael Weiß, Thomas Selhorst, Christine Müller-Graf, and Matthias Greiner. 2020. “Comparison of Bayesian and Frequentist Methods for Prevalence Estimation Under Misclassification.” BMC Public Health 20 (1). https://doi.org/10.1186/s12889-020-09177-4.

Gelman, Andrew, and Bob Carpenter. 2020. “Bayesian Analysis of Tests with Unknown Specificity and Sensitivity.” Journal of the Royal Statistical Society: Series C (Applied Statistics), August. https://doi.org/10.1111/rssc.12435.

Hallal, Pedro C, Fernando P Hartwig, Bernardo L Horta, Mariângela F Silveira, Claudio J Struchiner, Luı́s P Vidaletti, Nelson A Neumann, et al. 2020. “SARS-CoV-2 Antibody Prevalence in Brazil: Results from Two Successive Nationwide Serological Household Surveys.” The Lancet Global Health, September. https://doi.org/10.1016/s2214-109x(20)30387-9.

Kritsotakis, Evangelos I. 2020. “On the Importance of Population-Based Serological Surveys of SARS-CoV-2 Without Overlooking Their Inherent Uncertainties.” Public Health in Practice 1 (November): 100013. https://doi.org/10.1016/j.puhip.2020.100013.

Larremore, Daniel B., Bailey K Fosdick, Kate M Bubar, Sam Zhang, Stephen M Kissler, C. Jessica E. Metcalf, Caroline Buckee, and Yonatan Grad.2020.“Estimating SARS-CoV-2 Seroprevalence and Epidemiological Parameters with Uncertainty from Serological Surveys.” medRxiv, April. https://doi.org/10.1101/2020.04.15.20067066.

Larremore, Daniel B., Bailey K. Fosdick, Sam Zhang, and Yonatan Grad.2020.“Jointly Modeling Prevalence, Sensitivity and Specificity for Optimal Sample Allocation.” bioRxiv, May. https://doi.org/10.1101/2020.05.23.112649.

Lewis, Fraser I, and Paul R Torgerson. 2012. “A Tutorial in Estimating the Prevalence of Disease in Humans and Animals in the Absence of a Gold Standard Diagnostic.” Emerging Themes in Epidemiology 9 (1). https://doi.org/10.1186/1742-7622-9-9.

Rogan, Walter J., and Beth Gladen. 1978. “Estimating Prevalence from the Results of A Screening Test.” American Journal of Epidemiology 107 (1): 71–76. https://doi.org/10.1093/oxfordjournals.aje.a112510.

Silveira, Mariângela F., Aluı́sio J. D. Barros, Bernardo L. Horta, Lúcia C. Pellanda, Gabriel D. Victora, Odir A. Dellagostin, Claudio J. Struchiner, et al. 2020. “Population-Based Surveys of Antibodies Against SARS-CoV-2 in Southern Brazil.” Nature Medicine 26 (8): 1196–9. https://doi.org/10.1038/s41591-020-0992-3.
## How to cite this R package

Takahashi, Saki, Bryan Greenhouse, and Isabel Rodríguez-Barraquer. 2020. “Are SARS-CoV-2 seroprevalence estimates biased?” The Journal of Infectious Diseases, August. https://doi.org/10.1093/infdis/jiaa523.
```{r}
citation("serosurvey")
```
Loading

0 comments on commit 018e5a5

Please sign in to comment.