Implement parallelization #159

sbamin · 2024-05-08T14:11:57Z

Suggesting an improvement by adding parallelization for scaling using furrr::future_map function. On my end, resulting scaled df is identical with or without using parallel mode.

Also, using dispersion default value of 0.000001 instead of 0 to avoid NaN for entries not divisible by zero.

        dispersion <-
          stratum %>%
          dplyr::summarise_at(.funs = dispersion, .vars = variables) %>%
          dplyr::mutate(across(everything(), ~ if_else(. == 0, 0.000001, .))) %>%
          dplyr::collect()

Minor: Though I used data %>% dplyr::select(! any_of(variables)) instead of data %>% dplyr::select(-variables) for one of select statements, I think it should be data %>% dplyr::select(- all_of(variables)), to have a stricter implementation of select statement. Use of all_of will stop the code from running if not all variables are being excluded using a select query versus any_of will let it pass without any warning or error.

Suggesting an improvement by adding parallelization for scaling using furrr::future_map function. On my end, resulting scaled df is identical with or without using parallel mode. Also, using dispersion default value of 0.000001 instead of 0 to avoid NaN for entries not divisible by zero. ``` dispersion <- stratum %>% dplyr::summarise_at(.funs = dispersion, .vars = variables) %>% dplyr::mutate(across(everything(), ~ if_else(. == 0, 0.000001, .))) %>% dplyr::collect() ```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement parallelization #159

Implement parallelization #159

sbamin commented May 8, 2024

Implement parallelization #159

Are you sure you want to change the base?

Implement parallelization #159

Conversation

sbamin commented May 8, 2024