02-Methods.Rmd

  
# Methods  
Analyses were conducted in R. All observations from 2010-2013 were removed from analysis due to changes in reporting calculated scores. An additional 33 records with missing values were excluded from overall analysis. Summary statistics for all values of each calculated score were reviewed, including: number of records, number of readmissions, readmission rate, odds of readmission, and log odds.  
  
Certain values at the tail-ends of each calculated score exhibited insufficient sample size. Since these values may be considered as clinical extremes, the following aggregations ensured $N \ge 20$ across the range of values considered:

  + $CCI \ge 14$ were combined into one group, $N=28$  
  + $LACE \le 4$ were combined into one group, $N=55$  
  + $LACE \ge 18$ were combined into one group, $N=20$  
  + $HOSPITAL \ge 11$ were combined into one group, $N=22$  
  
Covariates were reviewed for sufficient sample size within each group to include in model-fitting. Exploratory data analysis also included visualizations of empirical log odds (logits) for each calculated score across all potential covariates.  
  
Model fitting began with logistic regression considering the association of acuity score with readmission risk. Separate models were fitted for each acuity score, and expanded to consider covariates as appropriate. Additional observations were dropped or aggregated in consideration of sample size and were be noted in each case. Individual models were assessed through reviewing deviance residuals and conducting a $\chi^2$ goodness-of-fit test. Individual predictors were assessed with confidence intervals. Models for each score were compared using Akaike's Information Criteria.  
  
*Note: add a MC simulation if there's enough time!*    
  

```{r include=FALSE}
knitr::opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
```

```{r load_libs}
# -- Load Libraries
library(readxl) # read in Excel files
library(car) # Type III Sums of Squares Anova
library(emmeans)  # multiple comparisons
library(tidyverse) # data processing, visualization
library(lubridate) # process dates
library(ggsci) # colors
library(rstatix)
library(gridExtra) # make and arrange tables
library(gtable) 
library(kableExtra)
library(DescTools) # categorical correlation
library(nnet) # multinomial model fitting
library(lme4) # mixed-level model fitting
library(corrr) # tidy correlations
```

```{r load_data}
snf_data <- read_xlsx("data/SNF_CODED_CALCULATED_SCORES_04042022.xlsx",
                      sheet = "CODED DATA AND CALCULATIONS")

readmits <- read_xlsx("data/Stern 30-day Readmissions 2014 - 2019.xlsx")
```

```{r format_var}
snf_data <- snf_data %>%
  transmute(SternClientID = `Stern Client Id`,
            MRN = MRN,
            Year = WorkSheet,
            CurrentAdmit = as.Date(`Current Admit`, format = "%m/%d/%y"),          
            DischargeDate = as.Date(`Discharge Date`, format = "%m/%d/%y"),
            Gender = factor(Gender),
            Race = Race,
            Ethnicity = factor(Ethnicity),
            InsurancePlanName = (InsurancePlanName),
            ED6moPrior = factor(`LACE Num_of_ED_visits_prior 6months`),
            PatientAge = PatientAge,
            AdmissionHospital = factor(AdmissionHospital),
            CCIAgeScore = ordered(`CCI AGE SCORE`),
            LACELOSDaysScore = ordered(`LACE LOSDays SCORE`),
            LOSDays = LOSDays,
            LOS5Days = factor(`LOS>=5days`),
            HOSPITALLOS5Days = factor(`HOSPITAL LOS>=5days`),
            VisitType = factor(VisitType),
            AdmitDtm = as.Date(AdmitDtm, format = "%m/%d/%y"),
            DischargeDtm = as.Date(DischargeDtm, format = "%m/%d/%y"),
            NumAdmitPastYear = `NumberOfAdmissionInPastYear`,
            PriorAdmin = ordered(`HOSPITAL # OF admission prior yr`),
            Cancer = factor(`HOSPITAL SCORE D/C ONC SERVICE`),
            AcuteAdmin = factor(`LACE ACUTE ADMISSION`),
            AdminType = factor(`HOSPITAL SCORE INDEX ADMISSION TYPE`),
            Hemoglobin = `Hemoglobin at discharge`,
            HemoglobinLevel = factor(`HOSPITAL SCORE Hemoglobin at discharge`),
            Sodium = `Sodium, Serum at discharge`,
            SodiumLevel = factor(`HOSPITAL SCORE Sodium`),
            LACECCI = `LACE CCI SCORE`,
            MyoInfarc = factor(`Myocardial infarction`),
            CongHrtFail = factor(`Congestive heart failure`),
            PeriphVascDis = factor(`Peripheral vascular disease`),
            CerebroVascDis = factor(`Cerebrovascular disease`),
            Dementia = factor(Dementia),
            ChronPulmoDis = factor(`Chronic pulmonary disease`),
            RheuConTisDis = factor(`Rheumatic disease=connective tissue disease`),
            PepUlcDis = factor(`Peptic ulcer disease`),
            MildLiverDis = factor(`MILD liver disease`), # combine into one liver disease var
            ModSevLiverDis = factor(`MODERATE or SEVERE  liver disease`),
            DiabetesWOChrCom = factor(`Diabetes WITHOUT chronic complication`), # combine into one diabetes var
            DiabetesWChrCom = factor(`Diabetes WITH chronic complication`),
            HemiParaPlegia = factor(`Hemiplegia or paraplegia`),
            RenalDisCKD = factor(`Renal disease = mod to severe CKD`),
            Malignancy = factor(`Any Malignancy`),
            MetaSolidTumor = factor(`Metastatic solid tumour`),
            AidsHIV = factor(`AIDS/HIV`),
            CCI = `CALCULATED CCI SCORE`,
            LACE = `CALCULATED LACE SCORE`,
            HOSPITAL = `CALCULATED HOSPITAL SCORE`,
            Readmit30 = `READMISSION <30DAYS`) %>%
  mutate(StayDurationDays = as.numeric((DischargeDate - CurrentAdmit)),
         DtmDuration = DischargeDtm - AdmitDtm)
```  

  
```{r standardize}
# Exclude records before 2014 due to reporting changes
snf_data <- snf_data %>% filter(Year > 2013)

# Create new column for whether or not record is a readmission
snf_data <- snf_data %>%
  mutate(Readmit30 = ifelse(Readmit30 == "N/A", "O", Readmit30)) %>%
  rename(Readmit = Readmit30)


# Standardize values of Race, Insurance, Liver Disease, Diabetes
snf_data <- snf_data %>%
  mutate(Race = as.factor(ifelse(Race == "White", "Caucasian/White",
                       ifelse((Race == "Declined" | Race == "Not Specified" | Race == "Unknown"), "Unavailable/Unknown",
                              ifelse(Race == "Multiracial", "Other/Multiracial", Race)))),
         InsurancePlanCleaned = ifelse(grepl("MEDICARE", InsurancePlanName), "MCARE_MCAID",
                                       ifelse(grepl("MCARE", InsurancePlanName), "MCARE_MCAID",
                                              ifelse(grepl("MEDICAID", InsurancePlanName), "MCARE_MCAID",
                                                     ifelse(grepl("MCAID", InsurancePlanName), "MCARE_MCAID",
                                                            "NO_MCARE_MCAID")))),
         LiverDis = ordered(ifelse(ModSevLiverDis == 3, 3,
                                   ifelse(MildLiverDis == 1, 1, 0))),
         Diabetes = ordered(ifelse(DiabetesWChrCom == 2, 2,
                                   ifelse(DiabetesWOChrCom == 1, 1, 0))))

# relevel Readmit factor, combine original admissions with readmissions > 30 days
snf_data <- snf_data %>%
  mutate(Readmit = ifelse(Readmit == "O", "N", Readmit)) %>%
  mutate(Readmit = as.factor(Readmit))

```

```{r, nas}
# Summarize missing values by variable
nas <- map(snf_data, ~sum(is.na(.)))
# nas %>% as_tibble(nas) 

# Look at records with missing values
NAlook <- function(x) rowSums(x) > 0 
NAs <- snf_data %>% filter(NAlook( across( .cols = everything(), .fns = ~ is.na(.x))))

# remove small number of records with NA values
snf_data <- snf_data %>% na.omit()
```  

```{r data_prep}
# Aggregate Race/Ethnicity
snf_data <- snf_data %>%
  mutate(Race_Eth = factor(ifelse(Ethnicity == "Hispanic or Latino", "Hispanic/Latino", as.character(Race))))

snf_data_a <- snf_data %>% 
  filter(Gender != "Unspecified") %>%
  mutate(Race_Eth = fct_recode(Race_Eth, "Other/Multiracial" = "Native Amer/Alaskan"),
         Insurance = as.factor(InsurancePlanCleaned))
snf_data_a$Gender <- droplevels(snf_data_a$Gender)


```

```{r join_los}
# join length of stay in snf for readmissions 3to main data
readmits <- readmits %>%
  mutate(SternClientID = as.numeric(`Stern ID #`)) %>%
  rename(LOS_snf = LOS) %>%
  select(c(SternClientID, LOS_snf))

snf_data <- snf_data_a %>%
  left_join(readmits, by = "SternClientID")

```