Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tidy_unzip() #2028

Merged
merged 2 commits into from
Jul 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# usethis (development version)

* `use_zip()` and `use_course()` are equipped to handle a ZIP where the parent folder is implicit (@burnsal, #1961).

* `use_test_helper()` is a new function to create a test helper file (@olivroy, #1822).

* `use_cpp11()` makes it easier to update `NAMESPACE` (@pachadotdev, #1921).
Expand Down
34 changes: 18 additions & 16 deletions R/course.R
Original file line number Diff line number Diff line change
Expand Up @@ -360,15 +360,15 @@ tidy_unzip <- function(zipfile, cleanup = FALSE) {
## DropBox ZIP files often include lots of hidden R, RStudio, and Git files
filenames <- filenames[keep_lgl(filenames)]

td <- top_directory(filenames)
loose_parts <- is.na(td)

if (loose_parts) {
parents <- path_before_slash(filenames)
unique_parents <- unique(parents)
if (length(unique_parents) == 1 && unique_parents != "") {
target <- path(base_path, unique_parents)
utils::unzip(zipfile, files = filenames, exdir = base_path)
} else {
# there is no parent; archive contains loose parts
target <- path_ext_remove(zipfile)
utils::unzip(zipfile, files = filenames, exdir = target)
} else {
target <- path(base_path, td)
utils::unzip(zipfile, files = filenames, exdir = base_path)
}
ui_bullets(c(
"v" = "Unpacking ZIP file into {.path {pth(target, base_path)}}
Expand Down Expand Up @@ -398,7 +398,7 @@ tidy_unzip <- function(zipfile, cleanup = FALSE) {
}
}

invisible(target)
invisible(unclass(target))
}

#' @rdname use_course_details
Expand Down Expand Up @@ -525,15 +525,17 @@ keep_lgl <- function(file,
!grepl(ignores, file, perl = TRUE)
}

top_directory <- function(filenames) {
in_top <- path_dir(filenames) == "."
unique_top <- unique(filenames[in_top])
is_directory <- grepl("/$", unique_top)
if (length(unique_top) > 1 || !is_directory) {
NA_character_
} else {
unique_top
path_before_slash <- function(filepath) {
f <- function(x) {
parts <- strsplit(x, "/", fixed = TRUE)[[1]]
if (length(parts) > 1 || grepl("/", x)) {
parts[1]
} else {
""
}
}
purrr::map_chr(filepath, f)

}

content_type <- function(h) {
Expand Down
192 changes: 151 additions & 41 deletions tests/testthat/ref/README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,66 +14,169 @@ library(fs)

## Different styles of ZIP file

Examples based on foo folder found here.
usethis has an unexported function `tidy_unzip()`, which is used under the hood in `use_course()` and `use_zip()`.
It is a wrapper around `utils::unzip()` that uses some heuristics to choose a good value for `exdir`, which is the "the directory to extract files to."
Why do we do this?
Because it's really easy to _not_ get the desired result when unpacking a ZIP archive.

Common aggravations:

* Instead of the unpacked files being corraled within a folder, they explode as "loose parts" into the current working directory. Too little nesting.
* The unpacked files are contained in a folder, but that folder itself is contained inside another folder. Too much nesting.

`tidy_unzip()` tries to get the nesting just right.

Why doesn't unzipping "just work"?
Because the people who make `.zip` files make lots of different choices when they actually create the archive and these details aren't baked in, i.e. a successful roundtrip isn't automatic.
It usually requires some peeking inside the archive and adjusting the unpack options.

This README documents specific `.zip` situations that we anticipate.

## Explicit parent folder

Consider the foo folder:

```{bash}
tree foo
```

### Not Loose Parts, a.k.a. GitHub style
Zip it up like so:

```{bash, eval = FALSE}
zip -r foo-explicit-parent.zip foo/
```

This is the type of ZIP file that we get from GitHub via links of the forms <https://github.com/r-lib/usethis/archive/main.zip> and <http://github.com/r-lib/usethis/zipball/main/>.

Inspect it in the shell:

```{bash}
unzip -Z1 foo-explicit-parent.zip
```

Or from R:

```{r}
foo_files <- unzip("foo-explicit-parent.zip", list = TRUE)
with(
foo_files,
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
)
```

Note that the folder `foo/` is explicitly included and all of the files are contained in it (in this case, just one file).

## Implicit parent folder

Consider the foo folder:

```{bash}
tree foo
```

This is the structure of ZIP files yielded by GitHub via links of the forms <https://github.com/r-lib/usethis/archive/master.zip> and <http://github.com/r-lib/usethis/zipball/master/>.
Zip it up like so:

```{bash, eval = FALSE}
zip -r foo-not-loose.zip foo/
zip -r foo-implicit-parent.zip foo/*
```

Notice that everything is packaged below one top-level directory.
Note the use of `foo/*`, as opposed to `foo` or `foo/`.
This type of ZIP file was reported in <https://github.com/r-lib/usethis/issues/1961>.
The example given there is <https://agdatacommons.nal.usda.gov/ndownloader/files/44576230>.

Inspect our small example in the shell:

```{bash}
unzip -Z1 foo-implicit-parent.zip
```

Or from R:

```{r}
foo_not_loose_files <- unzip("foo-not-loose.zip", list = TRUE)
foo_files <- unzip("foo-implicit-parent.zip", list = TRUE)
with(
foo_not_loose_files,
foo_files,
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
)
```

### Loose Parts, the Regular Way
Note that `foo/` is not included and its (original) existence is just implicit in the relative path to, e.g.,
`foo/file.txt`.

Here's a similar look at the example from issue #1961:

```bash
~/rrr/usethis/tests/testthat/ref % unzip -l ~/Downloads/Species\ v2.3.zip
Archive: /Users/jenny/Downloads/Species v2.3.zip
Length Date Time Name
--------- ---------- ----- ----
1241 04-27-2023 09:16 species_v2/label_encoder.txt
175187560 04-06-2023 15:13 species_v2/model_arch.pt
174953575 04-14-2023 12:28 species_v2/model_weights.pth
--------- -------
350142376 3 files
```

Note also that the implicit parent folder `species_v2` is not the base of the ZIP file name `Species v2.3.zip`.

This is the structure of many ZIP files I've seen, just in general.
## No parent

Consider the foo folder:

```{bash}
tree foo
```

Zip it up like so:

```{bash, eval = FALSE}
cd foo
zip ../foo-loose-regular.zip *
cd ..
(cd foo && zip -r ../foo-no-parent.zip .)
```

All the files are packaged in the ZIP archive as "loose parts", i.e. there is no explicit top-level directory.
Note that we are zipping everything in a folder from *inside* the folder.

Inspect our small example in the shell:

```{bash}
unzip -Z1 foo-no-parent.zip
```

Or from R:

```{r}
foo_loose_regular_files <- unzip("foo-loose-regular.zip", list = TRUE)
foo_files <- unzip("foo-no-parent.zip", list = TRUE)
with(
foo_loose_regular_files,
foo_files,
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
)
```

### Loose Parts, the DropBox Way
All the files are packaged in the ZIP archive as "loose parts", i.e. there is no explicit or implicit top-level directory.

## No parent, the DropBox Variation

This is the structure of ZIP files yielded by DropBox via links of this form <https://www.dropbox.com/sh/12345abcde/6789wxyz?dl=1>. I can't figure out how to even do this with zip locally, so I had to create an example on DropBox and download it. Jim Hester reports it is possible with `archive::archive_write_files()`.

<https://www.dropbox.com/sh/5qfvssimxf2ja58/AABz3zrpf-iPYgvQCgyjCVdKa?dl=1>

It's basically like the "loose parts" above, except it includes a spurious top-level directory `"/"`.
It's basically like the "no parent" example above, except it includes a spurious top-level directory `"/"`.

Inspect our small example in the shell:

```{bash}
unzip -Z1 foo-loose-dropbox.zip
```

Or from R:

```{r}
# curl::curl_download(
# "https://www.dropbox.com/sh/5qfvssimxf2ja58/AABz3zrpf-iPYgvQCgyjCVdKa?dl=1",
# destfile = "foo-loose-dropbox.zip"
# )
foo_loose_dropbox_files <- unzip("foo-loose-dropbox.zip", list = TRUE)
foo_files <- unzip("foo-loose-dropbox.zip", list = TRUE)
with(
foo_loose_dropbox_files,
foo_files,
data.frame(Name = Name, dirname = path_dir(Name), basename = path_file(Name))
)
```
Expand All @@ -87,43 +190,50 @@ mapname: conversion of failed
inflating: file.txt
```

So this is a pretty odd ZIP packing strategy. But we need to plan for it.
which indicates some tripping over the `/`.
Only `file.txt` is left behind.

## Subdirs only at top-level
This is a pretty odd ZIP packing strategy. But we need to plan for it.

Let's make sure we detect loose parts (or not) when the top-level has only directories, not files.
## Only subdirectories

Example based on the yo directory here:
For testing purposes, we also want an example where all the files are inside subdirectories.

Examples based on the yo directory here:

```{bash}
tree yo
```

```{bash, eval = FALSE}
zip -r yo-not-loose.zip yo/
```

```{r}
(yo_not_loose_files <- unzip("yo-not-loose.zip", list = TRUE))
top_directory(yo_not_loose_files$Name)
```
Zip it up, in all the usual ways:

```{bash, eval = FALSE}
cd yo
zip -r ../yo-loose-regular.zip *
cd ..
zip -r yo-explicit-parent.zip yo/
zip -r yo-implicit-parent.zip yo/*
(cd yo && zip -r ../yo-no-parent.zip .)
```

```{r}
(yo_loose_regular_files <- unzip("yo-loose-regular.zip", list = TRUE))
top_directory(yo_loose_regular_files$Name)
```
Again, I couldn't create the DropBox variant locally, so I did it by downloading from DropBox.

```{r}
```{r eval = FALSE}
# curl::curl_download(
# "https://www.dropbox.com/sh/afydxe6pkpz8v6m/AADHbMZAaW3IQ8zppH9mjNsga?dl=1",
# destfile = "yo-loose-dropbox.zip"
# )
(yo_loose_dropbox_files <- unzip("yo-loose-dropbox.zip", list = TRUE))
top_directory(yo_loose_dropbox_files$Name)
```

Inspect each in the shell:

```{bash}
unzip -Z1 yo-explicit-parent.zip
```

```{bash}
unzip -Z1 yo-implicit-parent.zip
```
```{bash}
unzip -Z1 yo-no-parent.zip
```
```{bash}
unzip -Z1 yo-loose-dropbox.zip
```
Loading
Loading