Update older years to bring the data in line with our newest processe…

…s. (#988) * fix sc_client_lookup sc_send_lca * fix an issue of get_pop_path * Style code * fix the rest of get_pop_path from get_datazone_pop_path * Update documentation * fix sc_send_lca * add missing year column * Remove redundant code * Update documentation * Style code * explicitly specify the argument year to avoid corruption of targets * Update documentation * Reorder when we match on client variables This was causing NSUs to show a social care id. This now resolves this. * Update documentation * Style code * Add chi parameter to `create_demog_test_flags` * Style code * Use CHI parameter for ep/indiv tests * Use CHI parameter for extract tests (chi) * Change test sheet names to lowercase * Change date to lowercase * Update documentation * new data pipeline with targets remove create_individual_files from targets and append it to run_targets script * minor changes * Style code * Update documentation * Update documentation * Style code * undo sc_send_lca bit * Add code for running years available * Update `_targets.R` script for running old years * Style code * Update `check_year_valid` for running old years * Use `check_year_valid` where no data for old yrs * Style code * Fix pick variables This was not taking the correct variables, leading to NSUs being assigned psychiatry * SC Demographics and SDS (#900) * Style code * # read in sc demographics different variables - removed extract date as not accurate, using chi over upi after discussion with social care data management. Added in date of death just for fun. * social care demographics first draft removed a lot of the submitted variables and instead using chi variables from chi seeding. Other changes: - Fill in missing values, - create flag for latest social care id (one from database is not accurate), this makes sure that each chi only has ONE sc id as the latest to stop it creating duplicates - change postcode to choose chi over submitted * Style code * had a github error? Not sure what happened but commiting first draft of sc demographics * Style code * first draft sds. No major changes - only how demographics is matched on and how latest social care id is selected * Update documentation * demographics - add sending location to group by * Style code * Update documentation * Added ungroup() * Remove comments * Remove comments * Style code --------- Co-authored-by: SwiftySalmon <[email protected]> Co-authored-by: marjom02 <[email protected]> Co-authored-by: Jennit07 <[email protected]> Co-authored-by: Jennit07 <[email protected]> Co-authored-by: Zihao Li <[email protected]> * Sc all at speedup (#904) * speed up process_sc_all_alarms_telecare function with data.table package * Update documentation --------- Co-authored-by: lizihao-anu <[email protected]> Co-authored-by: Megan McNicol <[email protected]> Co-authored-by: Jennit07 <[email protected]> * Add case_when statement for `high_cc` cohort * Bug - `high_cc` in demographic cohort showing `NAs` instead of `TRUE/FALSE` (#911) Add case_when statement for `high_cc` cohort * added a casewhen to update property type description for homelessness * Update documentation * Style code * Bug - deal with missing variables (#914) * Add missing sc variables for no sc data * Fix code for including `_inc_dna` variables * Remove commented line * Bug - Fix get pop path failing and preventing the indiv file from running. (#913) Fix bug - pop file paths breaking indiv file * correct file hscp file path * Declare missing variables for older years * setup targets scripts for old years * Style code * Include `check_year_valid` for sc client path * Add check year valid to join sc client * Add if else statement * WIP - TO DO - fix dummy path for `get_chi()` * Style code * update dummy data file to read empty tibble * Update `check_year_valid` * Update declared `NA` variables * Update documentation * declare `count_not_known` as NA * supply year as default in `aggregate_by_chi` * Decalre unused variables * Style code * Update sc client with sept update new code * Specify code for running older years * Style code * Add Running SLF files manually scripts * Style code * update write_tests_xlsx * update process_refined_death * fix tests by removing get_chi * add 2425 * Style code * fix NA matches in refined_death * move latest_cost_year() to cost_uplift() * improve automation * Update documentation * fix `cij_ppa` in DD data * fix bugs of dd and populate cij_delay back to episodes * Style code * keep all variable for delayed discharge episodes * remove dummy variable names from dd_date * Style code * remove `deceased_boxi` variable - bug * remove `create_person_id`. Its matched in client * remove `create_person_id` * Update `run_slf_manually` scripts * further remove person_id * fix duplicate row introduced by adding death * remove duplicated chi when joining death data * TODO: check distinct death data by chi while keeping chi==NA records * add parameter for year * fix duplicate in add_activity_after_death_flag * Update `check_year_valid` * Declare DN variables * Style code * Declare client variables * remove extra dd variables * remove redundant variables * remove fy variable * Remove redundant variable `count_not_known` * Remove duplicate code * revert commit - remove fy * update manual run * declare missing sc variables indiv file * Style code --------- Co-authored-by: Zihao Li <[email protected]> Co-authored-by: lizihao-anu <[email protected]> Co-authored-by: Jennit07 <[email protected]> Co-authored-by: Megan McNicol <[email protected]> Co-authored-by: SwiftySalmon <[email protected]> Co-authored-by: Zihao Li <[email protected]> Co-authored-by: marjom02 <[email protected]>
Public-Health-Scotland · Sep 17, 2024 · 1ab7e7f · 1ab7e7f
1 parent 69ce173
commit 1ab7e7f
Show file tree

Hide file tree

Showing 28 changed files with 587 additions and 35 deletions.
diff --git a/R/check_year_valid.R b/R/check_year_valid.R
@@ -17,6 +17,7 @@ check_year_valid <- function(
       "ch",
       "client",
       "cmh",
+      "cost_dna",
       "dd",
       "deaths",
       "dn",
@@ -34,9 +35,9 @@ check_year_valid <- function(
     )) {
   if (year <= "1415" && type %in% c("dn", "sparra")) {
     return(FALSE)
-  } else if (year <= "1516" && type %in% c("cmh", "homelessness")) {
+  } else if (year <= "1516" && type %in% c("cmh", "homelessness", "dd")) {
     return(FALSE)
-  } else if (year <= "1617" && type %in% c("ch", "hc", "sds", "at")) {
+  } else if (year <= "1617" && type %in% c("ch", "hc", "sds", "at", "client", "cost_dna")) {
     return(FALSE)
   } else if (year <= "1718" && type %in% "hhg") {
     return(FALSE)

diff --git a/R/create_episode_file.R b/R/create_episode_file.R
@@ -175,7 +175,65 @@ create_episode_file <- function(
         sc_social_worker = NA,
         sc_type_of_housing = NA,
         sc_meals = NA,
-        sc_day_care = NA
+        sc_day_care = NA,
+        social_care_id = NA,
+        sc_dementia = NA,
+        sc_learning_disability = NA,
+        sc_mental_health_disorders = NA,
+        sc_physical_and_sensory_disability = NA,
+        sc_drugs = NA,
+        sc_alcohol = NA,
+        sc_palliative_care = NA,
+        sc_carer = NA,
+        sc_elderly_frail = NA,
+        sc_neurological_condition = NA,
+        sc_autism = NA,
+        sc_other_vulnerable_groups = NA,
+        ch_provider_description = NA
+      )
+  }
+
+  if (!check_year_valid(year, type = "homelessness")) {
+    episode_file <- episode_file %>%
+      dplyr::mutate(
+        hl1_12_months_post_app = NA,
+        hl1_12_months_pre_app = NA,
+        hl1_6after_ep = NA,
+        hl1_6before_ep = NA,
+        hl1_application_ref = NA,
+        hl1_completeness = NA,
+        hl1_during_ep = NA,
+        hl1_in_fy = NA,
+        hl1_property_type = NA,
+        hl1_reason_ftm = NA,
+        hl1_sending_lca = NA
+      )
+  }
+
+  if (!check_year_valid(year, type = "dd")) {
+    episode_file <- episode_file %>%
+      dplyr::mutate(
+        cij_delay = NA,
+        dd_quality = NA,
+        dd_responsible_lca = NA,
+        delay_end_reason = NA,
+        primary_delay_reason = NA,
+        secondary_delay_reason = NA,
+      )
+  }
+
+  if (!check_year_valid(year, type = "dn")) {
+    episode_file <- episode_file %>%
+      dplyr::mutate(
+        ccm = NA,
+        total_no_dn_contacts = NA
+      )
+  }
+
+  if (!check_year_valid(year, type = "cost_dna")) {
+    episode_file <- episode_file %>%
+      dplyr::mutate(
+        cost_total_net_inc_dnas = NA
       )
   }
 
@@ -471,6 +529,11 @@ join_sc_client <- function(data,
                            file_type = c("episode", "individual")) {
   cli::cli_alert_info("Join social care client function started at {Sys.time()}")
 
+  if (!check_year_valid(year, type = "client")) {
+    data_file <- data
+    return(data_file)
+  }
+
   if (file_type == "episode") {
     # Match on client variables by chi
     data_file <- data %>%

diff --git a/R/create_individual_file.R b/R/create_individual_file.R
@@ -115,6 +115,9 @@ create_individual_file <- function(
         hc_personal_hours = NA,
         hc_non_personal_hours = NA,
         hc_reablement_hours = NA,
+        hc_non_personal_hours_cost = NA,
+        hc_personal_hours_cost = NA,
+        hc_reablement_hours_cost = NA,
         at_alarms = NA,
         at_telecare = NA,
         sds_option_1 = NA,
@@ -125,10 +128,33 @@ create_individual_file <- function(
         sc_support_from_unpaid_carer = NA,
         sc_social_worker = NA,
         sc_meals = NA,
-        sc_day_care = NA
+        sc_day_care = NA,
+        sc_type_of_housing = NA,
+        count_not_known = NA,
+        sc_latest_submission = NA,
+        social_care_id = NA,
+        person_id = NA,
+        sc_alcohol = NA,
+        sc_autism = NA,
+        sc_carer = NA,
+        sc_dementia = NA,
+        sc_drugs = NA,
+        sc_elderly_frail = NA,
+        sc_learning_disability = NA,
+        sc_mental_health_disorders = NA,
+        sc_neurological_condition = NA,
+        sc_other_vulnerable_groups = NA,
+        sc_palliative_care = NA,
+        sc_physical_and_sensory_disability = NA
       )
   }
 
+  if (!check_year_valid(year, type = "homelessness")) {
+    individual_file <- individual_file %>%
+      dplyr::mutate(hl1_in_fy = NA)
+  }
+
+
   if (anon_chi_out) {
     individual_file <- individual_file %>%
       tidyr::replace_na(list(chi = "")) %>%

diff --git a/R/get_boxi_extract_path.R b/R/get_boxi_extract_path.R
@@ -86,9 +86,11 @@ get_boxi_extract_path <- function(
 #'
 #' @return an [fs::path()] to a dummy file which can be used with targets.
 get_dummy_boxi_extract_path <- function() {
-  get_file_path(
+  dummy_path <- get_file_path(
     directory = get_dev_dir(),
     file_name = ".dummy",
     create = TRUE
   )
+
+  return(dummy_path)
 }
diff --git a/R/get_sc_lookup_paths.R b/R/get_sc_lookup_paths.R
@@ -38,11 +38,18 @@ get_sc_demog_lookup_path <- function(update = latest_update(), ...) {
 #' @family social care lookup file paths
 #' @seealso [get_file_path()] for the generic function.
 get_sc_client_lookup_path <- function(year, update = latest_update(), ...) {
-  sc_client_lookup_path <- get_file_path(
-    directory = fs::path(get_slf_dir(), "Social_care", "processed_sc_client_lookup"),
-    file_name = stringr::str_glue("anon-sc_client_lookup_{year}_{update}.parquet"),
-    ...
-  )
-
-  return(sc_client_lookup_path)
+  if (!check_year_valid(year, type = "client")) {
+    return(get_dummy_boxi_extract_path())
+  } else {
+    sc_client_lookup_path <- get_file_path(
+      directory = fs::path(
+        get_slf_dir(),
+        "Social_care",
+        "processed_sc_client_lookup"
+      ),
+      file_name = stringr::str_glue("anon-sc_client_lookup_{year}_{update}.parquet"),
+      ...
+    )
+    return(sc_client_lookup_path)
+  }
 }
diff --git a/R/link_delayed_discharge_eps.R b/R/link_delayed_discharge_eps.R
@@ -14,8 +14,12 @@ link_delayed_discharge_eps <- function(
     dd_data = read_file(get_source_extract_path(year, "dd")) %>% slfhelper::get_chi()) {
   cli::cli_alert_info("Link delayed discharge to episode file function started at {Sys.time()}")
 
-  names_ep <- names(episode_file)
+  if (!check_year_valid(year, type = "dd")) {
+    episode_file <- episode_file
+    return(episode_file)
+  }
 
+  names_ep <- names(episode_file)
   episode_file <- episode_file %>%
     dplyr::mutate(
       # remember to revoke the cij_end_date with dummy_cij_end
@@ -370,7 +374,7 @@ link_delayed_discharge_eps <- function(
       delay_dd,
       cij_delay
     )) %>%
-    dplyr::select(-c("has_dd", "delay_dd"))
+    dplyr::select(-c("has_dd", "delay_dd", "original_admission_date", "amended_dates"))
 
   return(linked_data)
 }
diff --git a/R/process_lookup_homelessness.R b/R/process_lookup_homelessness.R
@@ -15,6 +15,10 @@ create_homelessness_lookup <- function(
     homelessness_data = read_file(get_source_extract_path(year, "homelessness")) %>% slfhelper::get_chi()) {
   cli::cli_alert_info("Create homelessness lookup function started at {Sys.time()}")
 
+  # Specify years available for running
+  if (year < "1617") {
+    return(NULL)
+  }
   homelessness_lookup <- homelessness_data %>%
     dplyr::distinct(.data$chi, .data$record_keydate1, .data$record_keydate2) %>%
     tidyr::drop_na(.data$chi) %>%
@@ -39,6 +43,11 @@ add_homelessness_flag <- function(data, year,
                                   lookup = create_homelessness_lookup(year)) {
   cli::cli_alert_info("Add homelessness flag function started at {Sys.time()}")
 
+  if (!check_year_valid(year, type = "homelessness")) {
+    data <- data
+    return(data)
+  }
+
   data <- data %>%
     dplyr::left_join(
       lookup %>%
@@ -65,6 +74,11 @@ add_homelessness_flag <- function(data, year,
 add_homelessness_date_flags <- function(data, year, lookup = create_homelessness_lookup(year)) {
   cli::cli_alert_info("Add homelessness date flags function started at {Sys.time()}")
 
+  if (!check_year_valid(year, type = "homelessness")) {
+    data <- data
+    return(data)
+  }
+
   lookup <- lookup %>%
     dplyr::filter(!(is.na(.data$record_keydate2))) %>%
     dplyr::rename(

diff --git a/R/process_lookup_sc_client.R b/R/process_lookup_sc_client.R
@@ -20,6 +20,11 @@ process_lookup_sc_client <-
              slfhelper::get_chi() %>%
              dplyr::select(c("sending_location", "social_care_id", "chi", "latest_flag")),
            write_to_disk = TRUE) {
+    # Specify years available for running
+    if (year < "1718") {
+      return(NULL)
+    }
+
     # Match to demographics lookup to get CHI
     sc_client_demographics <- data %>%
       dplyr::right_join(

diff --git a/R/read_file.R b/R/read_file.R
@@ -24,7 +24,7 @@ read_file <- function(path, col_select = NULL, as_data_frame = TRUE, ...) {
 
   # Return an empty tibble if trying to read the dummy path
   if (path == get_dummy_boxi_extract_path()) {
-    return(tibble::tibble())
+    return(tibble::tibble(anon_chi = NA_character_))
   }
 
   ext <- fs::path_ext(path)

diff --git a/R/replace_sc_id_with_latest.R b/R/replace_sc_id_with_latest.R
@@ -7,40 +7,31 @@ replace_sc_id_with_latest <- function(data) {
   # Check for required variables
   check_variables_exist(
     data,
-    c("sending_location", "social_care_id", "chi", "period")
+    c("sending_location", "social_care_id", "chi", "latest_flag")
   )
 
   # select variables we need
   filter_data <- data %>%
     dplyr::select(
-      "sending_location", "social_care_id", "chi", "period"
+      "sending_location", "social_care_id", "chi", "latest_flag"
     ) %>%
-    dplyr::filter(!(is.na(.data$chi)))
+    dplyr::filter(!(is.na(.data$chi))) %>%
+    dplyr::distinct()
 
   change_sc_id <- filter_data %>%
-    # Sort (by sending_location, chi and period) for unique chi/sending location
-    dplyr::arrange(
-      .data$sending_location,
-      .data$chi,
-      dplyr::desc(.data$period)
-    ) %>%
-    # Find the latest sc_id for each chi/sending location by keeping latest period
-    dplyr::distinct(
-      .data$sending_location,
-      .data$chi,
-      .keep_all = TRUE
-    ) %>%
+    dplyr::filter(latest_flag == 1) %>%
     # Rename for latest sc id
     dplyr::rename(latest_sc_id = "social_care_id") %>%
-    # drop period for matching
-    dplyr::select(-"period")
+    # drop latest_flag for matching
+    dplyr::select(-"latest_flag")
 
   return_data <- change_sc_id %>%
     # Match back onto data
     dplyr::right_join(data,
       by = c("sending_location", "chi"),
       multiple = "all"
     ) %>%
+    dplyr::filter(!(is.na(period))) %>%
     # Overwrite sc id with the latest
     dplyr::mutate(
       social_care_id = dplyr::if_else(

diff --git a/R/write_tests_xlsx.R b/R/write_tests_xlsx.R
@@ -121,6 +121,14 @@ write_tests_xlsx <- function(comparison_data,
 
   date_today <- stringr::str_to_lower(date_today)
 
+  sheet_name_dated <- ifelse(
+    is.null(year),
+    stringr::str_glue("{sheet_name}_{date_today}"),
+    stringr::str_glue("{year}_{sheet_name}_{date_today}")
+  )
+
+  date_today <- stringr::str_to_lower(date_today)
+
   if (is.null(year)) {
     sheet_name_dated <- stringr::str_glue("{sheet_name}_{date_today}")
   } else {