diff --git a/README.Rmd b/README.Rmd index 19c35af..e348916 100644 --- a/README.Rmd +++ b/README.Rmd @@ -45,311 +45,37 @@ The goal of `serosurvey` is to gather __Serological Survey Analysis__ functions +You can install the developmental version of `serosurvey` from +[GitHub](https://github.com/avallecam/serosurvey) with: + ``` r if(!require("remotes")) install.packages("remotes") remotes::install_github("avallecam/serosurvey") ``` -## Example - -Three basic examples which shows you how to solve common problems: - -```{r example} -library(serosurvey) -``` - -```{r,echo=FALSE} -# additional -library(tidyverse) -library(srvyr) -library(survey) -library(tictoc) -library(furrr) -library(purrr) -# theme -theme_set(theme_bw()) -``` - -```{r,echo=FALSE} -data(api) - -datasurvey <- apiclus2 %>% - mutate(survey_all="survey_all") %>% - # create variables - mutate(outcome_one = awards, - outcome_two = cut(pct.resp,breaks = 2), - covariate_01 = stype, - covariate_02 = both) -``` - -```{r,echo=FALSE} -# tratamiento de stratos con un solo conglomerado -options(survey.lonely.psu = "certainty") - -# uu_clean_data %>% count(CONGLOMERADO,VIVIENDA) - -# diseño muestral de la encuesta --------------------------------- - -design <- datasurvey %>% - - filter(!is.na(outcome_one)) %>% #CRITICAL! ON OUTCOME - filter(!is.na(pw)) %>% #NO DEBEN DE HABER CONGLOMERADOS SIN WEIGHT - - as_survey_design( - id=c(dnum, snum), #~dnum+snum, # primary secondary sampling unit - # strata = strata, #clusters need to be nested in the strata - weights = pw # factores de expancion - ) -``` - -```{r,echo=FALSE} -# denominadores -covariate_set01 <- datasurvey %>% - select(covariate_01, - #sch.wide, - #comp.imp, - covariate_02) %>% - colnames() - -# numerators within outcome -covariate_set02 <- datasurvey %>% - select(#stype, - #sch.wide, - #comp.imp, - covariate_02) %>% - colnames() -``` +## Brief description -### 1. `survey`: Estimate single prevalences +The current workflow is divided in two steps: -- From a [`srvyr`](http://gdfe.co/srvyr/) __survey design object__, __`serosvy_proportion`__ estimates: +1. `survey`: Estimate multiple prevalences, and +2. `serology`: Estimate prevalence Under misclassification for a device +with Known or Unknown test performance - + weighted prevalence (`prop`), - + total population (`total`), - + raw proportion (`raw_prop`), - + coefficient of variability (`cv`), - + design effect (`deff`) - -```{r} -serosvy_proportion(design = design, - denominator = covariate_01, - numerator = outcome_one) -``` - -```{r,eval=FALSE} -example("serosvy_proportion") -``` - -### 2. `survey`: Estimate multiple prevalences +## More +- In the +[Introductory article](https://avallecam.github.io/serosurvey/articles/intro.html) +we provide detailed definitions and references of the methods available. - In -the [Article tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html) -we provide a workflow to __estimate multiple prevalences__: - - + using different set of covariates and outcomes as numerators or denominators, - + in one single pipe operation - -```{r} -# crear matriz - # - # set 01 of denominator-numerator - # -expand_grid( - design=list(design), - denominator=c("covariate_01","covariate_02"), # covariates - numerator=c("outcome_one","outcome_two") # outcomes - ) %>% - # - # set 02 of denominator-numerator (e.g. within main outcome) - # - union_all( - expand_grid( - design=list(design), - denominator=c("outcome_one","outcome_two"), # outcomes - numerator=c("covariate_02") # covariates - ) - ) %>% - # - # create symbols (to be readed as arguments) - # - mutate( - denominator=map(denominator,dplyr::sym), - numerator=map(numerator,dplyr::sym) - ) %>% - # - # estimate prevalence - # - mutate(output=pmap(.l = select(.,design,denominator,numerator), - .f = serosvy_proportion)) %>% - # - # show the outcome - # - select(-design,-denominator,-numerator) %>% - unnest(cols = c(output)) %>% - print(n=Inf) -``` - -#### `learnr` tutorial - -- Learn to build this with in a tutorial in Spanish: - -```r -# update package -if(!require("remotes")) install.packages("remotes") -remotes::install_github("avallecam/serosurvey") -# install learner and run tutorial -if(!require("learnr")) install.packages("learnr") -learnr::run_tutorial(name = "taller",package = "serosurvey") -``` +the +[Workflow article](https://avallecam.github.io/serosurvey/articles/howto-reprex.html) +we provide a reproducible example with this package. -### 3. `serology`: Estimate prevalence Under misclassification - -- We gather __one frequentist approach__ [@ROGAN1978], -available in different Github repos, that deal with -misclassification due to an imperfect diagnostic -test [@Azman2020; @Takahashi2020]. -Check the [Reference tab](https://avallecam.github.io/serosurvey/reference/index.html). - -- We provide __tidy outputs for bayesian approaches__ developed -in @Larremore2020unk [here](https://github.com/LarremoreLab/bayesian-joint-prev-se-sp/blob/master/singleSERO_uncertainTEST.R) -and @Larremore2020kno [here](https://github.com/LarremoreLab/covid_serological_sampling/blob/master/codebase/seroprevalence.R): - -- You can use them with [`purrr`](https://purrr.tidyverse.org/) and [`furrr`](https://davisvaughan.github.io/furrr/) to efficiently iterate -and parallelize this step for __multiple prevalences__. -Check the workflow -in [Article tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html). - - -#### __Known test performance - Bayesian method__ - -```{r,eval=FALSE} -serosvy_known_sample_posterior( - #in population - positive_number_test = 321, - total_number_test = 321+1234, - # known performance - sensitivity = 0.93, - specificity = 0.975 -) -``` - -```{r,echo=FALSE} -tidy_result <- serosvy_known_sample_posterior( - #in population - positive_number_test = 321, - total_number_test = 321+1234, - # known performance - sensitivity = 0.93, - specificity = 0.975 -) - -tidy_result_out <- - tidy_result %>% - select(summary) %>% - unnest(cols = c(summary)) - -tidy_result %>% - select(posterior) %>% - unnest(cols = c(posterior)) %>% - ggplot(aes(x = r1)) + - geom_histogram(aes(y=..density..),binwidth = 0.0005) + - geom_density() + - geom_vline(aes(xintercept=tidy_result_out %>% - pull(numeric.mean)), - color="red",lwd=1) + - geom_vline(aes(xintercept=tidy_result_out %>% - pull(numeric.p05)), - color="red") + - geom_vline(aes(xintercept=tidy_result_out %>% - pull(numeric.p95)), - color="red") + - scale_x_continuous(breaks = scales::pretty_breaks()) -``` - -```{r,eval=FALSE} -example("serosvy_known_sample_posterior") -``` - -#### __Unknown test performance - Bayesian method__ - -- The test performance is called _"unknown"_ or _"uncertain"_ when test -sensitivity and specificity are not known with -certainty [@Kritsotakis2020; @Diggle2011; @Gelman2020] and -lab validation data is available with a limited set of samples, -tipically during a novel pathogen outbreak. - -```{r,eval=FALSE,echo=FALSE} -# result_unk <- sample_posterior_r_mcmc_testun( -# samps = 10000, -# #in population -# pos = 692, #positive -# n = 3212, #total -# # in lab (local validation study) -# tp = 670,tn = 640,fp = 202,fn = 74) -``` - -```{r,eval=FALSE} -serosvy_unknown_sample_posterior_ii( - #in population - positive_number_test = 321, - total_number_test = 321+1234, - # in lab (local validation study) - true_positive = 670, - true_negative = 640, - false_positive = 202, - false_negative = 74) -``` - -```{r,echo=FALSE} -result_unk <- serosvy_unknown_sample_posterior_ii( - #in population - positive_number_test = 321, - total_number_test = 321+1234, - # in lab (local validation study) - true_positive = 670, - true_negative = 640, - false_positive = 202, - false_negative = 74) - -result_unk %>% - select(posterior) %>% - unnest(posterior) %>% - rownames_to_column() %>% - pivot_longer(cols = -rowname, - names_to = "estimates", - values_to = "values") %>% - ggplot(aes(x = values)) + - geom_histogram(aes(y=..density..),binwidth = 0.0005) + - geom_density() + - facet_grid(~estimates,scales = "free_x") -``` - -```{r,eval=FALSE} -example("serosvy_unknown_sample_posterior") -``` - ## Contributing Feel free to fill an issue or contribute with your functions or workflows in a pull request. -Here are a list of publications with interesting approaches using R: - -- @Silveira2020 and @Hallal2020 analysed a serological survey accounting for sampling design and test validity using parametric bootstraping, following @Lewis2012. - -- @Flor2020 implemented a lot of frequentist and bayesian methods for test with known sensitivity and specificity. Code is available [here](https://github.com/BfRstats/bayespem-validation-code). - -- @Gelman2020 also applied Bayesian inference with hierarchical regression -and post-stratification to account for test uncertainty -with unknown specificity and sensitivity. -Here a [case-study](https://github.com/bob-carpenter/diagnostic-testing/blob/master/src/case-study/seroprevalence-meta-analysis.Rmd). - -## How to cite this R package - -```{r} -citation("serosurvey") -``` - ## Contact Andree Valle Campos | @@ -364,28 +90,8 @@ Many thanks to the Centro Nacional de Epidemiología, Prevención y Control de Enfermedades [(CDC Perú)](https://www.dge.gob.pe/portalnuevo/) for the opportunity to work on this project. -## References - -Azman, Andrew S, Stephen Lauer, M. Taufiqur Rahman Bhuiyan, Francisco J Luquero, Daniel T Leung, Sonia Hegde, Jason B Harris, et al. 2020. “Vibrio Cholerae O1 Transmission in Bangladesh: Insights from a Nationally- Representative Serosurvey,” March. https://doi.org/10.1101/2020.03.13.20035352. - -Diggle, Peter J. 2011. “Estimating Prevalence Using an Imperfect Test.” Epidemiology Research International 2011: 1–5. https://doi.org/10.1155/2011/608719. - -Flor, Matthias, Michael Weiß, Thomas Selhorst, Christine Müller-Graf, and Matthias Greiner. 2020. “Comparison of Bayesian and Frequentist Methods for Prevalence Estimation Under Misclassification.” BMC Public Health 20 (1). https://doi.org/10.1186/s12889-020-09177-4. - -Gelman, Andrew, and Bob Carpenter. 2020. “Bayesian Analysis of Tests with Unknown Specificity and Sensitivity.” Journal of the Royal Statistical Society: Series C (Applied Statistics), August. https://doi.org/10.1111/rssc.12435. - -Hallal, Pedro C, Fernando P Hartwig, Bernardo L Horta, Mariângela F Silveira, Claudio J Struchiner, Luı́s P Vidaletti, Nelson A Neumann, et al. 2020. “SARS-CoV-2 Antibody Prevalence in Brazil: Results from Two Successive Nationwide Serological Household Surveys.” The Lancet Global Health, September. https://doi.org/10.1016/s2214-109x(20)30387-9. - -Kritsotakis, Evangelos I. 2020. “On the Importance of Population-Based Serological Surveys of SARS-CoV-2 Without Overlooking Their Inherent Uncertainties.” Public Health in Practice 1 (November): 100013. https://doi.org/10.1016/j.puhip.2020.100013. - -Larremore, Daniel B., Bailey K Fosdick, Kate M Bubar, Sam Zhang, Stephen M Kissler, C. Jessica E. Metcalf, Caroline Buckee, and Yonatan Grad.2020.“Estimating SARS-CoV-2 Seroprevalence and Epidemiological Parameters with Uncertainty from Serological Surveys.” medRxiv, April. https://doi.org/10.1101/2020.04.15.20067066. - -Larremore, Daniel B., Bailey K. Fosdick, Sam Zhang, and Yonatan Grad.2020.“Jointly Modeling Prevalence, Sensitivity and Specificity for Optimal Sample Allocation.” bioRxiv, May. https://doi.org/10.1101/2020.05.23.112649. - -Lewis, Fraser I, and Paul R Torgerson. 2012. “A Tutorial in Estimating the Prevalence of Disease in Humans and Animals in the Absence of a Gold Standard Diagnostic.” Emerging Themes in Epidemiology 9 (1). https://doi.org/10.1186/1742-7622-9-9. - -Rogan, Walter J., and Beth Gladen. 1978. “Estimating Prevalence from the Results of A Screening Test.” American Journal of Epidemiology 107 (1): 71–76. https://doi.org/10.1093/oxfordjournals.aje.a112510. - -Silveira, Mariângela F., Aluı́sio J. D. Barros, Bernardo L. Horta, Lúcia C. Pellanda, Gabriel D. Victora, Odir A. Dellagostin, Claudio J. Struchiner, et al. 2020. “Population-Based Surveys of Antibodies Against SARS-CoV-2 in Southern Brazil.” Nature Medicine 26 (8): 1196–9. https://doi.org/10.1038/s41591-020-0992-3. +## How to cite this R package -Takahashi, Saki, Bryan Greenhouse, and Isabel Rodríguez-Barraquer. 2020. “Are SARS-CoV-2 seroprevalence estimates biased?” The Journal of Infectious Diseases, August. https://doi.org/10.1093/infdis/jiaa523. +```{r} +citation("serosurvey") +``` diff --git a/README.md b/README.md index 90c0732..f848a23 100644 --- a/README.md +++ b/README.md @@ -25,248 +25,50 @@ Misclassification**. -``` r -if(!require("remotes")) install.packages("remotes") -remotes::install_github("avallecam/serosurvey") -``` - -## Example - -Three basic examples which shows you how to solve common problems: - -``` r -library(serosurvey) -``` - -### 1\. `survey`: Estimate single prevalences - - - From a [`srvyr`](http://gdfe.co/srvyr/) **survey design object**, - **`serosvy_proportion`** estimates: - - - weighted prevalence (`prop`), - - total population (`total`), - - raw proportion (`raw_prop`), - - coefficient of variability (`cv`), - - design effect (`deff`) - - - -``` r -serosvy_proportion(design = design, - denominator = covariate_01, - numerator = outcome_one) -#> # A tibble: 6 x 23 -#> denominator denominator_lev~ numerator numerator_level prop prop_low -#> -#> 1 covariate_~ E outcome_~ No 0.211 0.130 -#> 2 covariate_~ E outcome_~ Yes 0.789 0.675 -#> 3 covariate_~ H outcome_~ No 0.852 0.564 -#> 4 covariate_~ H outcome_~ Yes 0.148 0.0377 -#> 5 covariate_~ M outcome_~ No 0.552 0.224 -#> 6 covariate_~ M outcome_~ Yes 0.448 0.160 -#> # ... with 17 more variables: prop_upp , prop_cv , -#> # prop_se , total , total_low , total_upp , -#> # total_cv , total_se , total_deff , total_den , -#> # total_den_low , total_den_upp , raw_num , -#> # raw_den , raw_prop , raw_prop_low , raw_prop_upp -``` +You can install the developmental version of `serosurvey` from +[GitHub](https://github.com/avallecam/serosurvey) with: ``` r -example("serosvy_proportion") -``` - -### 2\. `survey`: Estimate multiple prevalences - - - In the [Article - tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html) - we provide a workflow to **estimate multiple prevalences**: - - - using different set of covariates and outcomes as numerators or - denominators, - - in one single pipe operation - - - -``` r -# crear matriz - # - # set 01 of denominator-numerator - # -expand_grid( - design=list(design), - denominator=c("covariate_01","covariate_02"), # covariates - numerator=c("outcome_one","outcome_two") # outcomes - ) %>% - # - # set 02 of denominator-numerator (e.g. within main outcome) - # - union_all( - expand_grid( - design=list(design), - denominator=c("outcome_one","outcome_two"), # outcomes - numerator=c("covariate_02") # covariates - ) - ) %>% - # - # create symbols (to be readed as arguments) - # - mutate( - denominator=map(denominator,dplyr::sym), - numerator=map(numerator,dplyr::sym) - ) %>% - # - # estimate prevalence - # - mutate(output=pmap(.l = select(.,design,denominator,numerator), - .f = serosvy_proportion)) %>% - # - # show the outcome - # - select(-design,-denominator,-numerator) %>% - unnest(cols = c(output)) %>% - print(n=Inf) -#> # A tibble: 25 x 23 -#> denominator denominator_lev~ numerator numerator_level prop prop_low -#> -#> 1 covariate_~ E outcome_~ No 0.211 0.130 -#> 2 covariate_~ E outcome_~ Yes 0.789 0.675 -#> 3 covariate_~ H outcome_~ No 0.852 0.564 -#> 4 covariate_~ H outcome_~ Yes 0.148 0.0377 -#> 5 covariate_~ M outcome_~ No 0.552 0.224 -#> 6 covariate_~ M outcome_~ Yes 0.448 0.160 -#> 7 covariate_~ E outcome_~ (-0.1,50] 0.182 0.0499 -#> 8 covariate_~ E outcome_~ (50,100] 0.818 0.515 -#> 9 covariate_~ H outcome_~ (-0.1,50] 0.0769 0.00876 -#> 10 covariate_~ H outcome_~ (50,100] 0.923 0.560 -#> 11 covariate_~ M outcome_~ (50,100] 1.00 1.00 -#> 12 covariate_~ No outcome_~ No 1.00 1.00 -#> 13 covariate_~ Yes outcome_~ No 0.0334 0.00884 -#> 14 covariate_~ Yes outcome_~ Yes 0.967 0.882 -#> 15 covariate_~ No outcome_~ (-0.1,50] 0.218 0.0670 -#> 16 covariate_~ No outcome_~ (50,100] 0.782 0.479 -#> 17 covariate_~ Yes outcome_~ (-0.1,50] 0.0914 0.0214 -#> 18 covariate_~ Yes outcome_~ (50,100] 0.909 0.684 -#> 19 outcome_one No covariat~ No 0.939 0.778 -#> 20 outcome_one No covariat~ Yes 0.0615 0.0148 -#> 21 outcome_one Yes covariat~ Yes 1.00 1.00 -#> 22 outcome_two (-0.1,50] covariat~ No 0.549 0.294 -#> 23 outcome_two (-0.1,50] covariat~ Yes 0.451 0.219 -#> 24 outcome_two (50,100] covariat~ No 0.305 0.188 -#> 25 outcome_two (50,100] covariat~ Yes 0.695 0.546 -#> # ... with 17 more variables: prop_upp , prop_cv , -#> # prop_se , total , total_low , total_upp , -#> # total_cv , total_se , total_deff , total_den , -#> # total_den_low , total_den_upp , raw_num , -#> # raw_den , raw_prop , raw_prop_low , raw_prop_upp -``` - -#### `learnr` tutorial - - - Learn to build this with in a tutorial in Spanish: - - - -``` r -# update package if(!require("remotes")) install.packages("remotes") remotes::install_github("avallecam/serosurvey") -# install learner and run tutorial -if(!require("learnr")) install.packages("learnr") -learnr::run_tutorial(name = "taller",package = "serosurvey") -``` - -### 3\. `serology`: Estimate prevalence Under misclassification - - - We gather **one frequentist approach** (Rogan and Gladen - [1978](#ref-ROGAN1978)), available in different Github repos, that - deal with misclassification due to an imperfect diagnostic test - (Azman et al. [2020](#ref-Azman2020); Takahashi, Greenhouse, and - Rodríguez-Barraquer [2020](#ref-Takahashi2020)). Check the - [Reference - tab](https://avallecam.github.io/serosurvey/reference/index.html). - - - We provide **tidy outputs for bayesian approaches** developed in - Daniel B. Larremore et al. ([2020](#ref-Larremore2020unk)) - [here](https://github.com/LarremoreLab/bayesian-joint-prev-se-sp/blob/master/singleSERO_uncertainTEST.R) - and Daniel B Larremore et al. ([2020](#ref-Larremore2020kno)) - [here](https://github.com/LarremoreLab/covid_serological_sampling/blob/master/codebase/seroprevalence.R): - - - You can use them with [`purrr`](https://purrr.tidyverse.org/) and - [`furrr`](https://davisvaughan.github.io/furrr/) to efficiently - iterate and parallelize this step for **multiple prevalences**. - Check the workflow in [Article - tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html). - -#### **Known test performance - Bayesian method** - -``` r -serosvy_known_sample_posterior( - #in population - positive_number_test = 321, - total_number_test = 321+1234, - # known performance - sensitivity = 0.93, - specificity = 0.975 -) ``` - +## Brief description -``` r -example("serosvy_known_sample_posterior") -``` +The current workflow is divided in two steps: -#### **Unknown test performance - Bayesian method** +1. `survey`: Estimate multiple prevalences, and +2. `serology`: Estimate prevalence Under misclassification for a device + with Known or Unknown test performance - - The test performance is called *“unknown”* or *“uncertain”* when - test sensitivity and specificity are not known with certainty - (Kritsotakis [2020](#ref-Kritsotakis2020); Diggle - [2011](#ref-Diggle2011); Gelman and Carpenter - [2020](#ref-Gelman2020)) and lab validation data is available with a - limited set of samples, tipically during a novel pathogen outbreak. +## More - - -``` r -serosvy_unknown_sample_posterior_ii( - #in population - positive_number_test = 321, - total_number_test = 321+1234, - # in lab (local validation study) - true_positive = 670, - true_negative = 640, - false_positive = 202, - false_negative = 74) -``` - - - -``` r -example("serosvy_unknown_sample_posterior") -``` + - In the [Introductory + article](https://avallecam.github.io/serosurvey/articles/intro.html) + we provide detailed definitions and references of the methods + available. + - In the [Workflow + article](https://avallecam.github.io/serosurvey/articles/howto-reprex.html) + we provide a reproducible example with this package. ## Contributing Feel free to fill an issue or contribute with your functions or workflows in a pull request. -Here are a list of publications with interesting approaches using R: +## Contact + +Andree Valle Campos | [`@avallecam`](https://twitter.com/avallecam) | + - - Silveira et al. ([2020](#ref-Silveira2020)) and Hallal et al. - ([2020](#ref-Hallal2020)) analysed a serological survey accounting - for sampling design and test validity using parametric bootstraping, - following Lewis and Torgerson ([2012](#ref-Lewis2012)). +Project Link: - - Flor et al. ([2020](#ref-Flor2020)) implemented a lot of frequentist - and bayesian methods for test with known sensitivity and - specificity. Code is available - [here](https://github.com/BfRstats/bayespem-validation-code). +## Acknowledgements - - Gelman and Carpenter ([2020](#ref-Gelman2020)) also applied Bayesian - inference with hierarchical regression and post-stratification to - account for test uncertainty with unknown specificity and - sensitivity. Here a - [case-study](https://github.com/bob-carpenter/diagnostic-testing/blob/master/src/case-study/seroprevalence-meta-analysis.Rmd). +Many thanks to the Centro Nacional de Epidemiología, Prevención y +Control de Enfermedades [(CDC +Perú)](https://www.dge.gob.pe/portalnuevo/) for the opportunity to work +on this project. ## How to cite this R package @@ -294,193 +96,3 @@ citation("serosurvey") #> url = {https://avallecam.github.io/serosurvey/}, #> } ``` - -## Contact - -Andree Valle Campos | [`@avallecam`](https://twitter.com/avallecam) | - - -Project Link: - -## Acknowledgements - -Many thanks to the Centro Nacional de Epidemiología, Prevención y -Control de Enfermedades [(CDC -Perú)](https://www.dge.gob.pe/portalnuevo/) for the opportunity to work -on this project. - -## References - -Azman, Andrew S, Stephen Lauer, M. Taufiqur Rahman Bhuiyan, Francisco J -Luquero, Daniel T Leung, Sonia Hegde, Jason B Harris, et al. 2020. -“Vibrio Cholerae O1 Transmission in Bangladesh: Insights from a -Nationally- Representative Serosurvey,” March. -. - -Diggle, Peter J. 2011. “Estimating Prevalence Using an Imperfect Test.” -Epidemiology Research International 2011: 1–5. -. - -Flor, Matthias, Michael Weiß, Thomas Selhorst, Christine Müller-Graf, -and Matthias Greiner. 2020. “Comparison of Bayesian and Frequentist -Methods for Prevalence Estimation Under Misclassification.” BMC Public -Health 20 (1). . - -Gelman, Andrew, and Bob Carpenter. 2020. “Bayesian Analysis of Tests -with Unknown Specificity and Sensitivity.” Journal of the Royal -Statistical Society: Series C (Applied Statistics), August. -. - -Hallal, Pedro C, Fernando P Hartwig, Bernardo L Horta, Mariângela F -Silveira, Claudio J Struchiner, Luı́s P Vidaletti, Nelson A Neumann, et -al. 2020. “SARS-CoV-2 Antibody Prevalence in Brazil: Results from Two -Successive Nationwide Serological Household Surveys.” The Lancet Global -Health, September. . - -Kritsotakis, Evangelos I. 2020. “On the Importance of Population-Based -Serological Surveys of SARS-CoV-2 Without Overlooking Their Inherent -Uncertainties.” Public Health in Practice 1 (November): 100013. -. - -Larremore, Daniel B., Bailey K Fosdick, Kate M Bubar, Sam Zhang, Stephen -M Kissler, C. Jessica E. Metcalf, Caroline Buckee, and Yonatan -Grad.2020.“Estimating SARS-CoV-2 Seroprevalence and Epidemiological -Parameters with Uncertainty from Serological Surveys.” medRxiv, April. -. - -Larremore, Daniel B., Bailey K. Fosdick, Sam Zhang, and Yonatan -Grad.2020.“Jointly Modeling Prevalence, Sensitivity and Specificity for -Optimal Sample Allocation.” bioRxiv, May. -. - -Lewis, Fraser I, and Paul R Torgerson. 2012. “A Tutorial in Estimating -the Prevalence of Disease in Humans and Animals in the Absence of a Gold -Standard Diagnostic.” Emerging Themes in Epidemiology 9 (1). -. - -Rogan, Walter J., and Beth Gladen. 1978. “Estimating Prevalence from the -Results of A Screening Test.” American Journal of Epidemiology 107 (1): -71–76. . - -Silveira, Mariângela F., Aluı́sio J. D. Barros, Bernardo L. Horta, Lúcia -C. Pellanda, Gabriel D. Victora, Odir A. Dellagostin, Claudio J. -Struchiner, et al. 2020. “Population-Based Surveys of Antibodies Against -SARS-CoV-2 in Southern Brazil.” Nature Medicine 26 (8): 1196–9. -. - -Takahashi, Saki, Bryan Greenhouse, and Isabel Rodríguez-Barraquer. 2020. -“Are SARS-CoV-2 seroprevalence estimates biased?” The Journal of -Infectious Diseases, August. . - -
- -
- -Azman, Andrew S, Stephen Lauer, M. Taufiqur Rahman Bhuiyan, Francisco J -Luquero, Daniel T Leung, Sonia Hegde, Jason B Harris, et al. 2020. -“Vibrio Cholerae O1 Transmission in Bangladesh: Insights from a -Nationally- Representative Serosurvey,” March. -. - -
- -
- -Diggle, Peter J. 2011. “Estimating Prevalence Using an Imperfect Test.” -*Epidemiology Research International* 2011: 1–5. -. - -
- -
- -Flor, Matthias, Michael Weiß, Thomas Selhorst, Christine Müller-Graf, -and Matthias Greiner. 2020. “Comparison of Bayesian and Frequentist -Methods for Prevalence Estimation Under Misclassification.” *BMC Public -Health* 20 (1). . - -
- -
- -Gelman, Andrew, and Bob Carpenter. 2020. “Bayesian Analysis of Tests -with Unknown Specificity and Sensitivity.” *Journal of the Royal -Statistical Society: Series C (Applied Statistics)*, August. -. - -
- -
- -Hallal, Pedro C, Fernando P Hartwig, Bernardo L Horta, Mariângela F -Silveira, Claudio J Struchiner, Luı́s P Vidaletti, Nelson A Neumann, et -al. 2020. “SARS-CoV-2 Antibody Prevalence in Brazil: Results from Two -Successive Nationwide Serological Household Surveys.” *The Lancet Global -Health*, September. . - -
- -
- -Kritsotakis, Evangelos I. 2020. “On the Importance of Population-Based -Serological Surveys of SARS-CoV-2 Without Overlooking Their Inherent -Uncertainties.” *Public Health in Practice* 1 (November): 100013. -. - -
- -
- -Larremore, Daniel B, Bailey K Fosdick, Kate M Bubar, Sam Zhang, Stephen -M Kissler, C. Jessica E. Metcalf, Caroline Buckee, and Yonatan Grad. -2020. “Estimating SARS-CoV-2 Seroprevalence and Epidemiological -Parameters with Uncertainty from Serological Surveys.” *medRxiv*, April. -. - -
- -
- -Larremore, Daniel B., Bailey K. Fosdick, Sam Zhang, and Yonatan H. Grad. -2020. “Jointly Modeling Prevalence, Sensitivity and Specificity for -Optimal Sample Allocation.” *bioRxiv*, May. -. - -
- -
- -Lewis, Fraser I, and Paul R Torgerson. 2012. “A Tutorial in Estimating -the Prevalence of Disease in Humans and Animals in the Absence of a Gold -Standard Diagnostic.” *Emerging Themes in Epidemiology* 9 (1). -. - -
- -
- -Rogan, Walter J., and Beth Gladen. 1978. “Estimating Prevalence from the -Results of A Screening Test.” *American Journal of Epidemiology* 107 -(1): 71–76. . - -
- -
- -Silveira, Mariângela F., Aluı́sio J. D. Barros, Bernardo L. Horta, Lúcia -C. Pellanda, Gabriel D. Victora, Odir A. Dellagostin, Claudio J. -Struchiner, et al. 2020. “Population-Based Surveys of Antibodies Against -SARS-CoV-2 in Southern Brazil.” *Nature Medicine* 26 (8): 1196–9. -. - -
- -
- -Takahashi, Saki, Bryan Greenhouse, and Isabel Rodríguez-Barraquer. 2020. -“Are SARS-CoV-2 seroprevalence estimates biased?” *The Journal of -Infectious Diseases*, August. . - -
- -
diff --git a/man/figures/README-unnamed-chunk-10-1.png b/man/figures/README-unnamed-chunk-10-1.png index cbeed1a..1a37e03 100644 Binary files a/man/figures/README-unnamed-chunk-10-1.png and b/man/figures/README-unnamed-chunk-10-1.png differ diff --git a/man/figures/README-unnamed-chunk-13-1.png b/man/figures/README-unnamed-chunk-13-1.png index 968e864..45e11f6 100644 Binary files a/man/figures/README-unnamed-chunk-13-1.png and b/man/figures/README-unnamed-chunk-13-1.png differ diff --git a/man/figures/README-unnamed-chunk-14-1.png b/man/figures/README-unnamed-chunk-14-1.png index 14ce10f..851713d 100644 Binary files a/man/figures/README-unnamed-chunk-14-1.png and b/man/figures/README-unnamed-chunk-14-1.png differ diff --git a/man/figures/README-unnamed-chunk-9-1.png b/man/figures/README-unnamed-chunk-9-1.png new file mode 100644 index 0000000..f4d475b Binary files /dev/null and b/man/figures/README-unnamed-chunk-9-1.png differ diff --git a/vignettes/intro.Rmd b/vignettes/intro.Rmd index e79b34a..eed6630 100644 --- a/vignettes/intro.Rmd +++ b/vignettes/intro.Rmd @@ -1,5 +1,5 @@ --- -title: "Introduction: serosurvey R package?" +title: "Introduction: serosurvey R package" output: rmarkdown::html_vignette vignette: > %\VignetteEncoding{UTF-8} @@ -27,11 +27,74 @@ options(tidyverse.quiet = TRUE) ## Introduction +Here we present three examples, definitions and related references: -```{r setup} +```{r example} library(serosurvey) ``` +```{r,echo=FALSE} +# additional +library(tidyverse) +library(srvyr) +library(survey) +library(tictoc) +library(furrr) +library(purrr) +# theme +theme_set(theme_bw()) +``` + +```{r,echo=FALSE} +data(api) + +datasurvey <- apiclus2 %>% + mutate(survey_all="survey_all") %>% + # create variables + mutate(outcome_one = awards, + outcome_two = cut(pct.resp,breaks = 2), + covariate_01 = stype, + covariate_02 = both) +``` + +```{r,echo=FALSE} +# tratamiento de stratos con un solo conglomerado +options(survey.lonely.psu = "certainty") + +# uu_clean_data %>% count(CONGLOMERADO,VIVIENDA) + +# diseño muestral de la encuesta --------------------------------- + +design <- datasurvey %>% + + filter(!is.na(outcome_one)) %>% #CRITICAL! ON OUTCOME + filter(!is.na(pw)) %>% #NO DEBEN DE HABER CONGLOMERADOS SIN WEIGHT + + as_survey_design( + id=c(dnum, snum), #~dnum+snum, # primary secondary sampling unit + # strata = strata, #clusters need to be nested in the strata + weights = pw # factores de expancion + ) +``` + +```{r,echo=FALSE} +# denominadores +covariate_set01 <- datasurvey %>% + select(covariate_01, + #sch.wide, + #comp.imp, + covariate_02) %>% + colnames() + +# numerators within outcome +covariate_set02 <- datasurvey %>% + select(#stype, + #sch.wide, + #comp.imp, + covariate_02) %>% + colnames() +``` + ### 1. `survey`: Estimate single prevalences - From a [`srvyr`](http://gdfe.co/srvyr/) __survey design object__, __`serosvy_proportion`__ estimates: @@ -42,6 +105,16 @@ library(serosurvey) + coefficient of variability (`cv`), + design effect (`deff`) +```{r} +serosvy_proportion(design = design, + denominator = covariate_01, + numerator = outcome_one) +``` + +```{r,eval=FALSE} +example("serosvy_proportion") +``` + ### 2. `survey`: Estimate multiple prevalences - In @@ -51,6 +124,56 @@ we provide a workflow to __estimate multiple prevalences__: + using different set of covariates and outcomes as numerators or denominators, + in one single pipe operation +```{r} +# crear matriz + # + # set 01 of denominator-numerator + # +expand_grid( + design=list(design), + denominator=c("covariate_01","covariate_02"), # covariates + numerator=c("outcome_one","outcome_two") # outcomes + ) %>% + # + # set 02 of denominator-numerator (e.g. within main outcome) + # + union_all( + expand_grid( + design=list(design), + denominator=c("outcome_one","outcome_two"), # outcomes + numerator=c("covariate_02") # covariates + ) + ) %>% + # + # create symbols (to be readed as arguments) + # + mutate( + denominator=map(denominator,dplyr::sym), + numerator=map(numerator,dplyr::sym) + ) %>% + # + # estimate prevalence + # + mutate(output=pmap(.l = select(.,design,denominator,numerator), + .f = serosvy_proportion)) %>% + # + # show the outcome + # + select(-design,-denominator,-numerator) %>% + unnest(cols = c(output)) %>% + print(n=Inf) +``` + +#### `learnr` tutorial + +- Learn to build this with in a tutorial in Spanish: + +```r +# install learner and run tutorial +if(!require("learnr")) install.packages("learnr") +learnr::run_tutorial(name = "taller",package = "serosurvey") +``` + ### 3. `serology`: Estimate prevalence Under misclassification @@ -72,6 +195,53 @@ in [Article tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.ht #### __Known test performance - Bayesian method__ +```{r,eval=FALSE} +serosvy_known_sample_posterior( + #in population + positive_number_test = 321, + total_number_test = 321+1234, + # known performance + sensitivity = 0.93, + specificity = 0.975 +) +``` + +```{r,echo=FALSE} +tidy_result <- serosvy_known_sample_posterior( + #in population + positive_number_test = 321, + total_number_test = 321+1234, + # known performance + sensitivity = 0.93, + specificity = 0.975 +) + +tidy_result_out <- + tidy_result %>% + select(summary) %>% + unnest(cols = c(summary)) + +tidy_result %>% + select(posterior) %>% + unnest(cols = c(posterior)) %>% + ggplot(aes(x = r1)) + + geom_histogram(aes(y=..density..),binwidth = 0.0005) + + geom_density() + + geom_vline(aes(xintercept=tidy_result_out %>% + pull(numeric.mean)), + color="red",lwd=1) + + geom_vline(aes(xintercept=tidy_result_out %>% + pull(numeric.p05)), + color="red") + + geom_vline(aes(xintercept=tidy_result_out %>% + pull(numeric.p95)), + color="red") + + scale_x_continuous(breaks = scales::pretty_breaks()) +``` + +```{r,eval=FALSE} +example("serosvy_known_sample_posterior") +``` #### __Unknown test performance - Bayesian method__ @@ -81,6 +251,56 @@ certainty [@Kritsotakis2020; @Diggle2011; @Gelman2020] and lab validation data is available with a limited set of samples, tipically during a novel pathogen outbreak. +```{r,eval=FALSE,echo=FALSE} +# result_unk <- sample_posterior_r_mcmc_testun( +# samps = 10000, +# #in population +# pos = 692, #positive +# n = 3212, #total +# # in lab (local validation study) +# tp = 670,tn = 640,fp = 202,fn = 74) +``` + +```{r,eval=FALSE} +serosvy_unknown_sample_posterior_ii( + #in population + positive_number_test = 321, + total_number_test = 321+1234, + # in lab (local validation study) + true_positive = 670, + true_negative = 640, + false_positive = 202, + false_negative = 74) +``` + +```{r,echo=FALSE} +result_unk <- serosvy_unknown_sample_posterior_ii( + #in population + positive_number_test = 321, + total_number_test = 321+1234, + # in lab (local validation study) + true_positive = 670, + true_negative = 640, + false_positive = 202, + false_negative = 74) + +result_unk %>% + select(posterior) %>% + unnest(posterior) %>% + rownames_to_column() %>% + pivot_longer(cols = -rowname, + names_to = "estimates", + values_to = "values") %>% + ggplot(aes(x = values)) + + geom_histogram(aes(y=..density..),binwidth = 0.0005) + + geom_density() + + facet_grid(~estimates,scales = "free_x") +``` + +```{r,eval=FALSE} +example("serosvy_unknown_sample_posterior") +``` + ## Contributing Feel free to fill an issue or contribute with your functions or workflows in a pull request.