Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the fs package to copy files. #178

Merged
merged 4 commits into from
Sep 18, 2024
Merged

Use the fs package to copy files. #178

merged 4 commits into from
Sep 18, 2024

Conversation

plietar
Copy link
Member

@plietar plietar commented Sep 10, 2024

It has more bells and whistles, and is often more performant. In particular, it makes use of CopyFileW (on Windows) and copy_file_range (on Linux) when applicable. This offers a big speedup when copying a file within a network drive, which is a common use case for orderly.

The reason why file.copy was introduced in #165 is that copying read-only files with the fs package wasn't working properly on Samba drives. It turns out this situation only occurs when overwrite = TRUE. When the flag is set to FALSE, the copy succeeds. To work around this we do an initial check before the copy to delete any files in the destination path, and copy with overwrite = FALSE. This was manually tested by running the test suite with the TMPDIR environment variable set to a path on a network drive.

@plietar plietar requested a review from richfitz September 10, 2024 18:49
fs::file_chmod(path, "a+w")
file.create(path)
fs::file_delete(path)
fs::file_create(path)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, chmod isn't immediately visible when working over Samba, even after it returns.
I had to do this to work around it and get the tests passing.

touch hello.txt
chmod 444 hello.txt; chmod 666 hello.txt; ls -l hello.txt

Running the second line many times, it will sometimes print the permissions as -rwxr-xr-x, sometimes as -r-xr-xr-x.

@plietar plietar marked this pull request as draft September 10, 2024 18:55
@plietar plietar removed the request for review from richfitz September 10, 2024 18:55
@plietar plietar force-pushed the mrc-5718-copy-files branch 3 times, most recently from 8799fc2 to 99e4937 Compare September 11, 2024 12:54
cli::cli_abort(paste(
"Destination path{?s} {?is a directory/are directories}:",
"{.path {paths}}"))
}
Copy link
Member Author

@plietar plietar Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fs::file_copy and file.copy both support having a single directory as the destination, but I felt this introduces unnecessary ambiguity and makes the implementation of overwrite below have to handle an extra edge case where overwrite = TRUE but the destination is a directory, in which case we do not want to delete it.

We mostly didn't exploit the old behaviour, apart from one place to copy the report script in run.R and in tests. These were easy enough to fix.

@plietar plietar force-pushed the mrc-5718-copy-files branch from 99e4937 to 492359e Compare September 11, 2024 12:56
@@ -477,8 +478,7 @@ copy_resources_implicit <- function(src, dst, resources, artefacts) {
}

copy_files(file.path(src, to_copy),
file.path(dst, to_copy),
overwrite = TRUE)
file.path(dst, to_copy))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function only ever gets called with a completely empty dst directory. There is no reason to use overwrite = TRUE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add make_writable = FALSE here, as even though that's the default you've emphasised this point on the previous call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I don't really understand the comment. The only "previous call" in this file also omits the make_writable argument. make_writable = FALSE is a no-op, it is not going to make things read-only (not that we would want that here anyway).

Maybe having copy_files do the chmod is confusing, and instead I remove the make_writable argument and move the chmod to the only two places it is actually needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous call was outpack_store.R:38 where you have make_writable = FALSE, but it looks like that should be TRUE :)

@plietar
Copy link
Member Author

plietar commented Sep 11, 2024

The tests all pass on Linux with a TMPDIR set to a network drive.

Depending on the libcurl version it may need the latest httr2 to fix r-lib/httr2#534
The leaked file handles prevent the test cleanup from removing some directories.

@plietar plietar marked this pull request as ready for review September 11, 2024 13:23
@plietar plietar requested a review from richfitz September 11, 2024 13:23
Comment on lines 36 to 37
# Files in the archive are read-only. It's easier on the user if we make
# them writable again.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is incorrect and duplicated from the other direction? Here we're making files read only - I think you can drop the comment at It's easier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh my mistake here is actually setting make_writable = FALSE instead of TRUE. I'm amazed there's not a test to catch that. I'll fix that, also need to change archive to store in the comment.

But otherwise the comment is correct. This function moves files out of the store and into the user's directory (either the drafts folder if running a packet, the src folder if running interactively or anywhere if the user just called orderly_copy_files).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, so the tests passed because it was doing orderly_copy_files twice and whether that works. Turns out that now works even if the file is read-only because the new overwrite = TRUE behaviour will overwrite read-only files, whereas the old one did not.

I think I might actually tweak the copy_files function to not allow that anymore.

@@ -477,8 +478,7 @@ copy_resources_implicit <- function(src, dst, resources, artefacts) {
}

copy_files(file.path(src, to_copy),
file.path(dst, to_copy),
overwrite = TRUE)
file.path(dst, to_copy))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to add make_writable = FALSE here, as even though that's the default you've emphasised this point on the previous call?

R/util.R Outdated
}

fs::file_copy(src, dst, overwrite = FALSE)
if (make_writable && length(dst) > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to move this length(dst)> 0 check to be an early exit at the point you check that src and dst are the same length?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I would rather remove it entirely, but I can't because fs doesn't handle it correctly: r-lib/fs#471 Keeping it to as small of a scope is the best compromise in my opinion, until that issue is fixed.

The rest of the function works the same for n = 0, and I see no reason to introduce special cases.

It has more bells and whistles, and is often more performant. In
particular, it makes use of `CopyFileW` (on Windows) and
`copy_file_range` (on Linux) when applicable. This offers a big speedup
when copying a file within a network drive, which is a common use case
for orderly.

The reason why `file.copy` was introduced in #165 is that copying
read-only files with the `fs` package wasn't working properly on Samba
drives. It turns out this situation only occurs when `overwrite = TRUE`.
When the flag is set to FALSE, the copy succeeds. To work around this we
do an initial check before the copy to delete any files in the
destination path, and copy with `overwrite = FALSE`. This was manually
tested by running the test suite with the `TMPDIR` environment variable
set to a path on a network drive.
- Refuse to overwrite read-only files
- Remove make_writable option
- Add some outpack-store tests
@plietar plietar force-pushed the mrc-5718-copy-files branch from 1b50561 to ebe4e66 Compare September 13, 2024 14:07
@plietar
Copy link
Member Author

plietar commented Sep 13, 2024

Okay so the last commit actually removes the make_writable argument. I don't really think there's a good reason to have copy_files do all the things, and it's just a fs::file_chmod call for the caller.

Also now refuses to overwrite read-only files, to match the default fs::file_copy(overwrite = TRUE) et al behaviour. The implementation is obviously very racy.

@plietar plietar requested a review from richfitz September 18, 2024 10:51
@plietar plietar merged commit 3ccdada into main Sep 18, 2024
10 checks passed
@plietar plietar deleted the mrc-5718-copy-files branch September 18, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants