clean readme + pass content to intro article

avallecam · Oct 22, 2020 · 018e5a5 · 018e5a5
1 parent 19ba2e0
commit 018e5a5
Show file tree

Hide file tree

Showing 7 changed files with 266 additions and 728 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -45,311 +45,37 @@ The goal of `serosurvey` is to gather __Serological Survey Analysis__ functions
 
 <!-- You can install the released version of serosurvey from [CRAN](https://CRAN.R-project.org) with: -->
 
+You can install the developmental version of `serosurvey` from 
+[GitHub](https://github.com/avallecam/serosurvey) with:
+
 ``` r
 if(!require("remotes")) install.packages("remotes")
 remotes::install_github("avallecam/serosurvey")
 ```
 
-## Example
-
-Three basic examples which shows you how to solve common problems:
-
-```{r example}
-library(serosurvey)
-```
-
-```{r,echo=FALSE}
-# additional
-library(tidyverse)
-library(srvyr)
-library(survey)
-library(tictoc)
-library(furrr)
-library(purrr)
-# theme
-theme_set(theme_bw())
-```
-
-```{r,echo=FALSE}
-data(api)
-
-datasurvey <- apiclus2 %>% 
-  mutate(survey_all="survey_all") %>% 
-  # create variables
-  mutate(outcome_one = awards,
-         outcome_two = cut(pct.resp,breaks = 2),
-         covariate_01 = stype,
-         covariate_02 = both)
-```
-
-```{r,echo=FALSE}
-# tratamiento de stratos con un solo conglomerado
-options(survey.lonely.psu = "certainty")
-
-# uu_clean_data %>% count(CONGLOMERADO,VIVIENDA)
-
-# diseño muestral de la encuesta ---------------------------------
-
-design <- datasurvey %>% 
-  
-  filter(!is.na(outcome_one)) %>% #CRITICAL! ON OUTCOME
-  filter(!is.na(pw)) %>% #NO DEBEN DE HABER CONGLOMERADOS SIN WEIGHT
-  
-  as_survey_design(
-    id=c(dnum, snum), #~dnum+snum, # primary secondary sampling unit
-    # strata = strata, #clusters need to be nested in the strata
-    weights = pw # factores de expancion
-  )
-```
-
-```{r,echo=FALSE}
-# denominadores
-covariate_set01 <- datasurvey %>% 
-  select(covariate_01,
-         #sch.wide,
-         #comp.imp,
-         covariate_02) %>% 
-  colnames()
-
-# numerators within outcome
-covariate_set02 <- datasurvey %>% 
-  select(#stype,
-         #sch.wide,
-         #comp.imp,
-         covariate_02) %>% 
-  colnames()
-```
+## Brief description
 
-### 1. `survey`: Estimate single prevalences
+The current workflow is divided in two steps:
 
-- From a [`srvyr`](http://gdfe.co/srvyr/) __survey design object__, __`serosvy_proportion`__ estimates:
+1. `survey`: Estimate multiple prevalences, and
+2. `serology`: Estimate prevalence Under misclassification for a device 
+with Known or Unknown test performance
 
-  + weighted prevalence (`prop`), 
-  + total population (`total`),
-  + raw proportion (`raw_prop`), 
-  + coefficient of variability (`cv`), 
-  + design effect (`deff`)
-
-```{r}
-serosvy_proportion(design = design,
-                   denominator = covariate_01,
-                   numerator = outcome_one)
-```
-
-```{r,eval=FALSE}
-example("serosvy_proportion")
-```
-
-### 2. `survey`: Estimate multiple prevalences
+## More
 
+- In the 
+[Introductory article](https://avallecam.github.io/serosurvey/articles/intro.html)
+we provide detailed definitions and references of the methods available.
 - In 
-the [Article tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html) 
-we provide a workflow to __estimate multiple prevalences__: 
-
-  + using different set of covariates and outcomes as numerators or denominators,
-  + in one single pipe operation
-
-```{r}
-# crear matriz
-  #
-  # set 01 of denominator-numerator
-  #
-expand_grid(
-  design=list(design),
-  denominator=c("covariate_01","covariate_02"), # covariates
-  numerator=c("outcome_one","outcome_two") # outcomes
-  ) %>% 
-  #
-  # set 02 of denominator-numerator (e.g. within main outcome)
-  #
-  union_all(
-    expand_grid(
-      design=list(design),
-      denominator=c("outcome_one","outcome_two"), # outcomes
-      numerator=c("covariate_02") # covariates
-    )
-  ) %>% 
-  #
-  # create symbols (to be readed as arguments)
-  #
-  mutate(
-    denominator=map(denominator,dplyr::sym),
-    numerator=map(numerator,dplyr::sym)
-  ) %>% 
-  #
-  # estimate prevalence
-  #
-  mutate(output=pmap(.l = select(.,design,denominator,numerator),
-                     .f = serosvy_proportion)) %>% 
-  #
-  # show the outcome
-  #
-  select(-design,-denominator,-numerator) %>% 
-  unnest(cols = c(output)) %>% 
-  print(n=Inf)
-```
-
-#### `learnr` tutorial
-
-- Learn to build this with in a tutorial in Spanish:
-
-```r
-# update package
-if(!require("remotes")) install.packages("remotes")
-remotes::install_github("avallecam/serosurvey")
-# install learner and run tutorial
-if(!require("learnr")) install.packages("learnr")
-learnr::run_tutorial(name = "taller",package = "serosurvey")
-```
+the 
+[Workflow article](https://avallecam.github.io/serosurvey/articles/howto-reprex.html) 
+we provide a reproducible example with this package. 
 
 
-### 3. `serology`: Estimate prevalence Under misclassification
-
-- We gather __one frequentist approach__ [@ROGAN1978], 
-available in different Github repos, that deal with 
-misclassification due to an imperfect diagnostic 
-test [@Azman2020; @Takahashi2020].
-Check the [Reference tab](https://avallecam.github.io/serosurvey/reference/index.html).
-
-- We provide __tidy outputs for bayesian approaches__ developed 
-in @Larremore2020unk [here](https://github.com/LarremoreLab/bayesian-joint-prev-se-sp/blob/master/singleSERO_uncertainTEST.R) 
-and @Larremore2020kno [here](https://github.com/LarremoreLab/covid_serological_sampling/blob/master/codebase/seroprevalence.R):
-
-- You can use them with [`purrr`](https://purrr.tidyverse.org/) and [`furrr`](https://davisvaughan.github.io/furrr/) to efficiently iterate 
-and parallelize this step for __multiple prevalences__.
-Check the workflow 
-in [Article tab](https://avallecam.github.io/serosurvey/articles/howto-reprex.html).
-
-
-#### __Known test performance - Bayesian method__
-
-```{r,eval=FALSE}
-serosvy_known_sample_posterior(
-  #in population
-  positive_number_test = 321,
-  total_number_test = 321+1234,
-  # known performance
-  sensitivity = 0.93,
-  specificity = 0.975
-)
-```
-
-```{r,echo=FALSE}
-tidy_result <- serosvy_known_sample_posterior(
-  #in population
-  positive_number_test = 321,
-  total_number_test = 321+1234,
-  # known performance
-  sensitivity = 0.93,
-  specificity = 0.975
-)
-
-tidy_result_out <-
-  tidy_result %>%
-  select(summary) %>%
-  unnest(cols = c(summary))
-
-tidy_result %>%
- select(posterior) %>%
- unnest(cols = c(posterior)) %>%
- ggplot(aes(x = r1)) +
- geom_histogram(aes(y=..density..),binwidth = 0.0005) +
- geom_density() +
- geom_vline(aes(xintercept=tidy_result_out %>%
-                  pull(numeric.mean)),
-            color="red",lwd=1) +
- geom_vline(aes(xintercept=tidy_result_out %>%
-                  pull(numeric.p05)),
-            color="red") +
- geom_vline(aes(xintercept=tidy_result_out %>%
-                  pull(numeric.p95)),
-            color="red") +
- scale_x_continuous(breaks = scales::pretty_breaks())
-```
-
-```{r,eval=FALSE}
-example("serosvy_known_sample_posterior")
-```
-
-#### __Unknown test performance - Bayesian method__
-
-- The test performance is called _"unknown"_ or _"uncertain"_ when test 
-sensitivity and specificity are not known with 
-certainty [@Kritsotakis2020; @Diggle2011; @Gelman2020] and 
-lab validation data is available with a limited set of samples, 
-tipically during a novel pathogen outbreak.
-
-```{r,eval=FALSE,echo=FALSE}
-# result_unk <- sample_posterior_r_mcmc_testun(
-#   samps = 10000,
-#   #in population
-#   pos = 692, #positive
-#   n = 3212, #total
-#   # in lab (local validation study)
-#   tp = 670,tn = 640,fp = 202,fn = 74)
-```
-
-```{r,eval=FALSE}
-serosvy_unknown_sample_posterior_ii(
-  #in population
-  positive_number_test = 321,
-  total_number_test = 321+1234,
-  # in lab (local validation study)
-  true_positive = 670,
-  true_negative = 640,
-  false_positive = 202,
-  false_negative = 74)
-```
-
-```{r,echo=FALSE}
-result_unk <- serosvy_unknown_sample_posterior_ii(
-  #in population
-  positive_number_test = 321,
-  total_number_test = 321+1234,
-  # in lab (local validation study)
-  true_positive = 670,
-  true_negative = 640,
-  false_positive = 202,
-  false_negative = 74)
-
-result_unk %>%
-  select(posterior) %>% 
-  unnest(posterior) %>%
-  rownames_to_column() %>%
-  pivot_longer(cols = -rowname,
-               names_to = "estimates",
-               values_to = "values") %>%
-  ggplot(aes(x = values)) +
-  geom_histogram(aes(y=..density..),binwidth = 0.0005) +
-  geom_density() +
-  facet_grid(~estimates,scales = "free_x")
-```
-
-```{r,eval=FALSE}
-example("serosvy_unknown_sample_posterior")
-```
-
 ## Contributing
 
 Feel free to fill an issue or contribute with your functions or workflows in a pull request. 
 
-Here are a list of publications with interesting approaches using R:
-
-- @Silveira2020 and @Hallal2020 analysed a serological survey accounting for sampling design and test validity using parametric bootstraping, following @Lewis2012.
-
-- @Flor2020 implemented a lot of frequentist and bayesian methods for test with known sensitivity and specificity. Code is available [here](https://github.com/BfRstats/bayespem-validation-code).
-
-- @Gelman2020 also applied Bayesian inference with hierarchical regression 
-and post-stratification to account for test uncertainty
-with unknown specificity and sensitivity. 
-Here a [case-study](https://github.com/bob-carpenter/diagnostic-testing/blob/master/src/case-study/seroprevalence-meta-analysis.Rmd).
-
-## How to cite this R package
-
-```{r}
-citation("serosurvey")
-```
-
 ## Contact
 
 Andree Valle Campos | 
@@ -364,28 +90,8 @@ Many thanks to the Centro Nacional de Epidemiología, Prevención y Control
 de Enfermedades [(CDC Perú)](https://www.dge.gob.pe/portalnuevo/)
 for the opportunity to work on this project.
 
-## References
-
-Azman, Andrew S, Stephen Lauer, M. Taufiqur Rahman Bhuiyan, Francisco J Luquero, Daniel T Leung, Sonia Hegde, Jason B Harris, et al. 2020. “Vibrio Cholerae O1 Transmission in Bangladesh: Insights from a Nationally- Representative Serosurvey,” March. https://doi.org/10.1101/2020.03.13.20035352.
-
-Diggle, Peter J. 2011. “Estimating Prevalence Using an Imperfect Test.” Epidemiology Research International 2011: 1–5. https://doi.org/10.1155/2011/608719.
-
-Flor, Matthias, Michael Weiß, Thomas Selhorst, Christine Müller-Graf, and Matthias Greiner. 2020. “Comparison of Bayesian and Frequentist Methods for Prevalence Estimation Under Misclassification.” BMC Public Health 20 (1). https://doi.org/10.1186/s12889-020-09177-4.
-
-Gelman, Andrew, and Bob Carpenter. 2020. “Bayesian Analysis of Tests with Unknown Specificity and Sensitivity.” Journal of the Royal Statistical Society: Series C (Applied Statistics), August. https://doi.org/10.1111/rssc.12435.
-
-Hallal, Pedro C, Fernando P Hartwig, Bernardo L Horta, Mariângela F Silveira, Claudio J Struchiner, Luı́s P Vidaletti, Nelson A Neumann, et al. 2020. “SARS-CoV-2 Antibody Prevalence in Brazil: Results from Two Successive Nationwide Serological Household Surveys.” The Lancet Global Health, September. https://doi.org/10.1016/s2214-109x(20)30387-9.
-
-Kritsotakis, Evangelos I. 2020. “On the Importance of Population-Based Serological Surveys of SARS-CoV-2 Without Overlooking Their Inherent Uncertainties.” Public Health in Practice 1 (November): 100013. https://doi.org/10.1016/j.puhip.2020.100013.
-
-Larremore, Daniel B., Bailey K Fosdick, Kate M Bubar, Sam Zhang, Stephen M Kissler, C. Jessica E. Metcalf, Caroline Buckee, and Yonatan Grad.2020.“Estimating SARS-CoV-2 Seroprevalence and Epidemiological Parameters with Uncertainty from Serological Surveys.” medRxiv, April. https://doi.org/10.1101/2020.04.15.20067066.
-
-Larremore, Daniel B., Bailey K. Fosdick, Sam Zhang, and Yonatan Grad.2020.“Jointly Modeling Prevalence, Sensitivity and Specificity for Optimal Sample Allocation.” bioRxiv, May. https://doi.org/10.1101/2020.05.23.112649.
-
-Lewis, Fraser I, and Paul R Torgerson. 2012. “A Tutorial in Estimating the Prevalence of Disease in Humans and Animals in the Absence of a Gold Standard Diagnostic.” Emerging Themes in Epidemiology 9 (1). https://doi.org/10.1186/1742-7622-9-9.
-
-Rogan, Walter J., and Beth Gladen. 1978. “Estimating Prevalence from the Results of A Screening Test.” American Journal of Epidemiology 107 (1): 71–76. https://doi.org/10.1093/oxfordjournals.aje.a112510.
-
-Silveira, Mariângela F., Aluı́sio J. D. Barros, Bernardo L. Horta, Lúcia C. Pellanda, Gabriel D. Victora, Odir A. Dellagostin, Claudio J. Struchiner, et al. 2020. “Population-Based Surveys of Antibodies Against SARS-CoV-2 in Southern Brazil.” Nature Medicine 26 (8): 1196–9. https://doi.org/10.1038/s41591-020-0992-3.
+## How to cite this R package
 
-Takahashi, Saki, Bryan Greenhouse, and Isabel Rodríguez-Barraquer. 2020. “Are SARS-CoV-2 seroprevalence estimates biased?” The Journal of Infectious Diseases, August. https://doi.org/10.1093/infdis/jiaa523.
+```{r}
+citation("serosurvey")
+```