[help] package loading with crew 1.0.0 #1437

dakvid · 2025-02-26T02:57:19Z

dakvid
Feb 26, 2025

Help

I understand and agree to https://books.ropensci.org/targets/help.html.

Description

Kia ora koutou. My large pipeline started failing when I upgraded to crew 1.0.0 (and targets 1.10.1). I've figured out that it's got something to do with how packages are loaded, but my attempts at trying to build a minimal example haven't managed to trigger the error yet.

So initially I guess my plea for help is for advice or suggestions on trying to build a (non)working example...

For context, this is a mostly ETL pipeline with just under 5,000 targets. Everything ran fine with crew 0.10.2, and runs fine without crew, but after upgrading to crew 1.0.0 I'm getting weird errors.

The packages are all loaded with library() in packages.R, which is sourceed early on in _targets.R. One of those packages in conflicted, and conflicted::conflicts_prefer is called in .Rprofile.
The conflicts_prefer call used to be in packages.R, but I shifted that out when we started using crew about 14 months ago. I think that I tried sourcing packages.R from .Rprofile too at the time and ran into issues (that I didn't record any specifics for) so switched back.

I've got three local controllers, primarily so I can limit the number of concurrent connections to each of the two databases:

workers_processing <-
  crew_controller_local(name = "processing",
                        workers = Sys.getenv("CREW_WORKERS_PROCESSING") |> as.numeric())
workers_database <-
  crew_controller_local(name = "database",
                        workers = Sys.getenv("CREW_WORKERS_DATABASE") |> as.numeric())
workers_rweb <- 
  crew_controller_local(name = "legacy_db",
                        workers = Sys.getenv("CREW_WORKERS_RWEB") |> as.numeric())
tar_option_set(controller = crew_controller_group(workers_processing,
                                                  workers_database,
                                                  workers_rweb),
               storage = "worker",
               retrieval = "worker",
               resources = tar_resources(
                 crew = tar_resources_crew(controller = "processing")
               ))

I removed the _targets folder to start from a black slate and get an error from the first target that needs to load another as a dependency:

could not load dependency my_feather_target of target my_database_load_target. could not find packages curl, data.table, purrr, lubridate, dplyr, readr, magrittr, arrow in library paths

My assumption was that the packages aren't being passed on from the main environment to the worker processes. I managed to replicate this with a minimal example.

My solution was to source("packages.R") from .Rprofile and that indeed seemed to solve the issue. However I later ran into errors like the following:

object 'data_series_code' not found

which is from a use of filter(data_series_code == "xxx") where it's interpreting filter as stats::filter instead of dplyr::filter as set in conflicted::conflicts_prefer. 🤔

After playing around with several different things I seem to be able to reproducibly get past these steps each time by:

comment out the source and conflicts_prefer lines in .Rprofile
run tar_make(my_failed_target) and get an error from conflicted about the ambiguous use of filter
uncomment the conflicts_prefer line but not the source line in .Rprofile
run tar_make(my_failed_target) again successfully
This only seems to work if I pass tar_make the specific target, rather than trying to run the wider pipeline. And of course, running tar_make() again runs into the "could not find packages" error above, and then uncommenting the source line in later runs into another confusing error due to conflicted preferences being dropped or ignored.

At first I was thinking that I just needed to configure things differently, but the fact that I seem to be able to get all targets to run, but not all at once and not with the same configuration, seems to suggest something's going wrong in crew or further down.

Obviously it's not that helpful without a reproducible example. I've created a small example that I thought would be sufficient - with three different local groups and use of conflicted - but it hasn't managed to throw up the same errors.

Answered by uhkeller

Feb 26, 2025

Have you tried using hooks for this? It's what the documentation recommends and I've never had an issue with it.
https://books.ropensci.org/targets/static.html#hooks

View full answer

dakvid · 2025-02-26T03:34:49Z

dakvid
Feb 26, 2025
Author

Ah, when looking back at my minimal example I realised that it did actually do something odd, so it may be of some use after all.

I've got five data targets and 10 model targets (adapted from use_targets()). If I run it with packages loaded from _targets.R and not .Rprofile then it fails with the "could not find packages" error mentioned above, but not immediately. The five data targets all use the same function, so I would have thought that they should either all fail or all succeed - not have a mix:

> tar_destroy()
> tar_make()
[conflicted] Will prefer magrittr::extract over any other package.
▶ dispatched target data_b
▶ dispatched target data_c
● completed target data_b [2.226 seconds, 1.749 kilobytes]
▶ dispatched target model1b
▶ dispatched target model2b
▶ dispatched target data_d
▶ dispatched target data_e
● completed target data_c [2.218 seconds, 1.746 kilobytes]
▶ dispatched target model1c
▶ dispatched target model2c
▶ recorded workspace data_d
✖ errored target data_d
✖ errored pipeline [6.211 seconds]
Warning message:
1 targets produced warnings. Run targets::tar_meta(fields = warnings, complete_only = TRUE) for the messages. 
Error:
! targets::tar_make() error

── Debugging ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    • tar_errored()
    • tar_meta(fields = any_of("error"), complete_only = TRUE)
    • tar_workspace()
    • tar_workspaces()

── How to ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    • Debug: https://books.ropensci.org/targets/debugging.html
    • Help: https://books.ropensci.org/targets/help.html

── Last error message ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    could not find packages magrittr, arrow in library paths 

── Last error traceback ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
    base::tryCatch(base::withCallingHandlers({ NULL base::saveRDS(base::do.c...
    tryCatchList(expr, classes, parentenv, handlers)
    tryCatchOne(tryCatchList(expr, names[-nh], parentenv, handlers[-nh]), na...
    doTryCatch(return(expr), name, parentenv, handler)
    tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
    tryCatchOne(expr, names, parentenv, handlers[[1L]])
    doTryCatch(return(expr), name, parentenv, handler)
    base::withCallingHandlers({ NULL base::saveRDS(base::do.call(base::do.ca...
    base::saveRDS(base::do.call(base::do.call, base::c(base::readRDS("/Users...
    base::do.call(base::do.call, base::c(base::readRDS("/Users/davidf/temp/R...
    (function (what, args, quote = FALSE, envir = parent.frame()) { if (!is....
    (function (targets_function, targets_arguments, options, envir = NULL, s...
    tryCatch(out <- withCallingHandlers(targets::tar_callr_inner_try(targets...
    tryCatchList(expr, classes, parentenv, handlers)
    tryCatchOne(expr, names, parentenv, handlers[[1L]])
    doTryCatch(return(expr), name, parentenv, handler)
    withCallingHandlers(targets::tar_callr_inner_try(targets_function = targ...
    targets::tar_callr_inner_try(targets_function = targets_function, target...
    do.call(targets_function, targets_arguments)
    (function (pipeline, path_store, names_quosure, shortcut, reporter, seco...
    crew_init(pipeline = pipeline, meta = meta_init(path_store = path_store)...
    self$run_crew()
    self$iterate()
    self$conclude_worker_task()
    target_conclude(target, self$pipeline, self$scheduler, self$meta)
    target_conclude.tar_builder(target, self$pipeline, self$scheduler, self$...
    builder_error(target, pipeline, scheduler, meta)
    builder_handle_error(target, pipeline, scheduler, meta)
    builder_error_exit(target, pipeline, scheduler, meta)
    tar_throw_run(target$metrics$error, class = target$metrics$error_class)
    tar_error(message = paste0(...), class = base::union(custom_error_classe...
    rlang::abort(message = message, class = class, call = tar_envir_base)
    signal_abort(cnd, .file)

These are the files:

# packages.R
library(conflicted)   # 1.2.0
library(targets)   # 1.10.1
library(crew)   # 1.0.0
library(arrow)   # 18.1.0.1
library(tibble)   # 3.2.1
library(magrittr)   # 2.0.3
library(tidyr)   # 1.3.1

# .Rprofile

# source("packages.R")
conflicted::conflicts_prefer(magrittr::extract)

# _targets.R

source("packages.R")

workers_test1 <- crew_controller_local(name = "test1", workers = 2)
workers_test2 <- crew_controller_local(name = "test2", workers = 2)
workers_test3 <- crew_controller_local(name = "test3", workers = 2)

tar_option_set(
  controller = crew_controller_group(workers_test1,
                                           workers_test2,
                                           workers_test3),
  resources = tar_resources(crew = tar_resources_crew(controller = "test1"), aws = NULL),
  storage = "worker",
  retrieval = "worker"
)

tar_source()

list(
  tar_target(
    name = data_a,
    command = do_make_data(),
    format = "feather"
  ),
  tar_target(
    name = data_b,
    command = do_make_data(),
    format = "feather"
  ),
  tar_target(
    name = data_c,
    command = do_make_data(),
    format = "feather"
  ),
  tar_target(
    name = data_d,
    command = do_make_data(),
    format = "feather"
  ),
  tar_target(
    name = data_e,
    command = do_make_data(),
    format = "feather"
  ),
  tar_target(
    name = model1a,
    command = do_model1(data_a),
    resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
  ),
  tar_target(
    name = model1b,
    command = do_model1(data_b),
    resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
  ),
  tar_target(
    name = model1c,
    command = do_model1(data_c),
    resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
  ),
  tar_target(
    name = model1d,
    command = do_model1(data_d),
    resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
  ),
  tar_target(
    name = model1e,
    command = do_model1(data_e),
    resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
  ),
  tar_target(
    name = model2a,
    command = do_model2(data_a),
    resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
  ),
  tar_target(
    name = model2b,
    command = do_model2(data_b),
    resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
  ),
  tar_target(
    name = model2c,
    command = do_model2(data_c),
    resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
  ),
  tar_target(
    name = model2d,
    command = do_model2(data_d),
    resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
  ),
  tar_target(
    name = model2e,
    command = do_model2(data_e),
    resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
  )
)

# R/test_functions.R

do_make_data <- function() { 
  Sys.sleep(2)
  tibble(x = rnorm(100), y = rnorm(100)) 
}

do_model1 <- function(data) {
  Sys.sleep(3)
  coefficients(lm(y ~ x, data = data)) |> extract(1)
}

do_model2 <- function(data) {
  Sys.sleep(3)
  coefficients(lm(y ~ x, data = data)) |> extract(2)
}

1 reply

dakvid Feb 26, 2025
Author

Hmm, with more playing around I've been able to generate the error I was expecting by swapping the order of loading for magrittr and tidyr:

no applicable method for 'extract' applied to an object of class "c('double', 'numeric')"

which is what you get if you try to use tidyr::extract instead of magrittr::extract.

And furthermore, I've managed to get things working - including my original pipeline - by moving library(conflicted) down to the bottom of .Rprofile!

I had a look over at the conflicted repo, which I didn't expect to be the source of the issue since it hasn't been updated in two years, and there is an open issue about things not always working when called from .Rprofile. So that looks like it might be the root cause...

uhkeller · 2025-02-26T05:42:41Z

uhkeller
Feb 26, 2025

Have you tried using hooks for this? It's what the documentation recommends and I've never had an issue with it.
https://books.ropensci.org/targets/static.html#hooks

1 reply

dakvid Feb 27, 2025
Author

Ah, thanks for this - it's been a little while since I've read that chapter and I'd forgotten it used {conflicted} as an example. I wouldn't say that the documentation recommends using hooks for {conflicted} - just that it can be used to avoid using .Rprofile.

I was a bit concerned that using a hook to call conflicts_prefer before all 5,000 targets - since the nine (so far) functions from {dplyr}, {lubridate}, {magrittr} and {purrr} are used all over the place - would add some non-trivial overhead. I ran a slice of my pipeline with ~650 targets and the hook approach added 9% when run for the first time (7.3min vs 6.7min), but not much when all targets were up to date (8.2s vs 8.1s).

Loading {conflicted} last seems to allow the .Rprofile approach to work (for now) so I'll stick with that whilst I can, but it's good to know that {targets} offers a workaround if I need it in future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[help] package loading with crew 1.0.0 #1437

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[help] package loading with crew 1.0.0 #1437

dakvid Feb 26, 2025

Help

Description

Replies: 2 comments · 2 replies

dakvid Feb 26, 2025 Author

dakvid Feb 26, 2025 Author

uhkeller Feb 26, 2025

dakvid Feb 27, 2025 Author

dakvid
Feb 26, 2025

Replies: 2 comments 2 replies

dakvid
Feb 26, 2025
Author

dakvid Feb 26, 2025
Author

uhkeller
Feb 26, 2025

dakvid Feb 27, 2025
Author