Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mrc-5483: Fix issue when pulling multiple packets with multiple files #145

Merged
merged 2 commits into from
Jul 4, 2024

Conversation

r-ash
Copy link
Contributor

@r-ash r-ash commented Jun 24, 2024

Issue spotted by Jeff when pulling packets

> orderly2::orderly_location_pull_packet(name = "split_model_results")
Error in data.frame(..., stringsAsFactors = FALSE, check.names = FALSE) : 
  arguments imply differing number of rows: 537, 31

> traceback()
6: stop(gettextf("arguments imply differing number of rows: %s", 
       paste(unique(nrows), collapse = ", ")), domain = NA)
5: data.frame(..., stringsAsFactors = FALSE, check.names = FALSE)
4: data_frame(hash = unlist(lapply(meta, function(x) x$files$hash), 
       FALSE, FALSE), size = unlist(lapply(meta, function(x) x$files$size), 
       FALSE, FALSE), location = location_use)
3: location_build_pull_plan_files(packets$fetch, location, root, 
       call)
2: location_build_pull_plan(ids, options$locations, recursive, root, 
       call = environment())
1: orderly2::orderly_location_pull_packet(name = "split_model_results")

This is happening because if we pull with a query which finds multiple packets which are at more than 1 location then we're not matching up the location to the file information correctly. In particular, if the packets have a total number of files which is not a multiple of the number of locations then this errors 🙈 Expect we've not seen this in our other tests because we're always creating 1 file in the test data. This PR adds a test to recreate the issue and fixes it.

I tidied up the docs here a little bit too. But Jeff was definitely a bit confused about this interface, he was expecting passing name = "..." to only pull the latest with that name. It wasn't immediately obvious to him that this would pull all of them, and I can see why. Will leave this for @M-Kusumgar to discuss

@r-ash r-ash requested review from richfitz and M-Kusumgar June 24, 2024 09:36
@richfitz
Copy link
Member

richfitz commented Jul 4, 2024

Mantra and I discussed adding a latest = helper here too, in the same way as name is a helper

@richfitz
Copy link
Member

richfitz commented Jul 4, 2024

This issue also turns up in #144 and #146 implicitly in that we might want a "pull exactly one" vs "pull all that match" interface

@@ -273,7 +273,7 @@ orderly_location_pull_metadata <- function(location = NULL, root = NULL,
##' efficient, as we keep track of files that are copied over even in
##' the case of an interrupted pull.
##'
##' @title Pull a single packet from a location
##' @title Pull one or more packets from a location
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearly our idea and implementation have drifted apart at some point

@richfitz richfitz merged commit 07c1d31 into main Jul 4, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants