-
Help
DescriptionKia ora koutou. My large pipeline started failing when I upgraded to crew 1.0.0 (and targets 1.10.1). I've figured out that it's got something to do with how packages are loaded, but my attempts at trying to build a minimal example haven't managed to trigger the error yet. So initially I guess my plea for help is for advice or suggestions on trying to build a (non)working example... For context, this is a mostly ETL pipeline with just under 5,000 targets. Everything ran fine with crew 0.10.2, and runs fine without crew, but after upgrading to crew 1.0.0 I'm getting weird errors. The packages are all loaded with I've got three local controllers, primarily so I can limit the number of concurrent connections to each of the two databases: workers_processing <-
crew_controller_local(name = "processing",
workers = Sys.getenv("CREW_WORKERS_PROCESSING") |> as.numeric())
workers_database <-
crew_controller_local(name = "database",
workers = Sys.getenv("CREW_WORKERS_DATABASE") |> as.numeric())
workers_rweb <-
crew_controller_local(name = "legacy_db",
workers = Sys.getenv("CREW_WORKERS_RWEB") |> as.numeric())
tar_option_set(controller = crew_controller_group(workers_processing,
workers_database,
workers_rweb),
storage = "worker",
retrieval = "worker",
resources = tar_resources(
crew = tar_resources_crew(controller = "processing")
)) I removed the
My assumption was that the packages aren't being passed on from the main environment to the worker processes. I managed to replicate this with a minimal example. My solution was to
which is from a use of After playing around with several different things I seem to be able to reproducibly get past these steps each time by:
At first I was thinking that I just needed to configure things differently, but the fact that I seem to be able to get all targets to run, but not all at once and not with the same configuration, seems to suggest something's going wrong in crew or further down. Obviously it's not that helpful without a reproducible example. I've created a small example that I thought would be sufficient - with three different local groups and use of conflicted - but it hasn't managed to throw up the same errors. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Ah, when looking back at my minimal example I realised that it did actually do something odd, so it may be of some use after all. I've got five data targets and 10 model targets (adapted from
These are the files: # packages.R
library(conflicted) # 1.2.0
library(targets) # 1.10.1
library(crew) # 1.0.0
library(arrow) # 18.1.0.1
library(tibble) # 3.2.1
library(magrittr) # 2.0.3
library(tidyr) # 1.3.1 # .Rprofile
# source("packages.R")
conflicted::conflicts_prefer(magrittr::extract) # _targets.R
source("packages.R")
workers_test1 <- crew_controller_local(name = "test1", workers = 2)
workers_test2 <- crew_controller_local(name = "test2", workers = 2)
workers_test3 <- crew_controller_local(name = "test3", workers = 2)
tar_option_set(
controller = crew_controller_group(workers_test1,
workers_test2,
workers_test3),
resources = tar_resources(crew = tar_resources_crew(controller = "test1"), aws = NULL),
storage = "worker",
retrieval = "worker"
)
tar_source()
list(
tar_target(
name = data_a,
command = do_make_data(),
format = "feather"
),
tar_target(
name = data_b,
command = do_make_data(),
format = "feather"
),
tar_target(
name = data_c,
command = do_make_data(),
format = "feather"
),
tar_target(
name = data_d,
command = do_make_data(),
format = "feather"
),
tar_target(
name = data_e,
command = do_make_data(),
format = "feather"
),
tar_target(
name = model1a,
command = do_model1(data_a),
resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
),
tar_target(
name = model1b,
command = do_model1(data_b),
resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
),
tar_target(
name = model1c,
command = do_model1(data_c),
resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
),
tar_target(
name = model1d,
command = do_model1(data_d),
resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
),
tar_target(
name = model1e,
command = do_model1(data_e),
resources = tar_resources(crew = tar_resources_crew(controller = "test2"))
),
tar_target(
name = model2a,
command = do_model2(data_a),
resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
),
tar_target(
name = model2b,
command = do_model2(data_b),
resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
),
tar_target(
name = model2c,
command = do_model2(data_c),
resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
),
tar_target(
name = model2d,
command = do_model2(data_d),
resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
),
tar_target(
name = model2e,
command = do_model2(data_e),
resources = tar_resources(crew = tar_resources_crew(controller = "test3"))
)
) # R/test_functions.R
do_make_data <- function() {
Sys.sleep(2)
tibble(x = rnorm(100), y = rnorm(100))
}
do_model1 <- function(data) {
Sys.sleep(3)
coefficients(lm(y ~ x, data = data)) |> extract(1)
}
do_model2 <- function(data) {
Sys.sleep(3)
coefficients(lm(y ~ x, data = data)) |> extract(2)
} |
Beta Was this translation helpful? Give feedback.
-
Have you tried using hooks for this? It's what the documentation recommends and I've never had an issue with it. |
Beta Was this translation helpful? Give feedback.
Have you tried using hooks for this? It's what the documentation recommends and I've never had an issue with it.
https://books.ropensci.org/targets/static.html#hooks