Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a persist_with_retry feature, enabling more resiliency on Windows #316

Open
Coruscant11 opened this issue Nov 29, 2024 · 5 comments

Comments

@Coruscant11
Copy link

Coruscant11 commented Nov 29, 2024

Hi!

While working with uv, I encountered issues with a python dependency, httpx unable to be installed because of a os error 5 permission denied.
The error occur when we try to persist a .exe file from a temporary folder into a persistent one.
I only reproduce the issue in an enterprise Windows Jenkins Runner. In my virtual machines, I don't have any issues. So I think this is most probably coming from the system configuration. (Antivirus I think).

Here is the function doing it: https://github.com/astral-sh/uv/blob/9864d23f48cfc00417d2969ef2ab6cdfda5bc084/crates/uv-install-wheel/src/wheel.rs#L427
It is using a NamedTempFile, coming from here: https://github.com/astral-sh/uv/blob/main/crates/uv-fs/src/lib.rs#L146

While doing some research and speaking with some colleagues (hi @vmeurisse), it seems that the issue is a very recurrent one on Windows.
In the Javascript ecosystem, there is this package, created by the @isaacs, npm inventor: https://www.npmjs.com/package/graceful-fs

The improvements are meant to normalize behavior across different platforms and environments, and to make filesystem access more resilient to errors.
One of its feature is this one:
On Windows, it retries renaming a file for up to one second if EACCESS or EPERM error occurs, likely because antivirus software has locked the directory.
I think it is most probably a way to solve our issue on Windows.
As you saw, uv is using the persist function, Here is the source: https://docs.rs/crate/tempfile/latest/source/src/file/imp/windows.rs#88
This function is performing a move, which is a rename right?

So I was wondering if, instead of just implementing a workaround on application side (uv), implementing a kind of persist with retry feature, as graceful-fs is doing, directly inside this crate would be a good idea? It would help a lot of people I think !

What do you think?
I of course volunteer to work on this 🙂 I will try a draft pull-request in the meantime in order to validate my fix.

Thanks, and thanks again for this crate!

@Coruscant11
Copy link
Author

Coruscant11 commented Nov 29, 2024

I confirm we have an AV.

@Coruscant11
Copy link
Author

Coruscant11 commented Nov 30, 2024

Just found that uv developers implemented the same algo:
https://github.com/astral-sh/uv/blob/22fd9f7ff17adfbec6880c6d92c162e3acb6a41c/crates/uv-fs/src/lib.rs#L221

Was wondering if in your opinion, this kind of logic should stay outside of this crate for example?

This crate is relatively simple, maybe you don't want to add more more some specific use-cases features.

@Stebalien
Copy link
Owner

I'd be willing to add this to the default persist method behind a feature flag (disabled by default). That way applications can enable this feature if they run into issues.

I'd rather not:

  1. Add a new method that makes no sense outside of Windows.
  2. Add any sleeps by default (no way to know if we're in a sync context, an async context, etc.

Would a feature flag work for you?

@Coruscant11
Copy link
Author

Hi @Stebalien , sorry for the late answer!

First let me thank you to be that open for contributions.

Would a feature flag work for you?

This could make sense!
I was wondering if using windows-only extra dependencies with the feature-flag, helping to implement the retry would be ok?
The project as almost no dependencies at all, so I wanted to ask first.

Add any sleeps by default (no way to know if we're in a sync context, an async context, etc.

Make sense. Just wondering however if this could lead to some issues (with file locks or something similar...)

To be transparent I still don't know if this is the good place to implement this kind of mechanisms.. but I guess that a feature flag allows us to not pollute the crate too much. 🙂

FYI I already added the mechanism on uv side, and confirmed it works. (But there is a sleep).

Thanks again @Stebalien ! I will work on a draft pull-request on my side and come back to you when I have something.
I have no issue at all if we find that the result is not satisfying 🙂

Cheers

@Stebalien
Copy link
Owner

I was wondering if using windows-only extra dependencies with the feature-flag, helping to implement the retry would be ok?

I'd prefer to avoid any dependencies unless the retry logic is complicated and the dependency is small.

To be transparent I still don't know if this is the good place to implement this kind of mechanisms.. but I guess that a feature flag allows us to not pollute the crate too much. 🙂

My thinking is this: windows application developers are going to get bug reports from users and I want to give them a quick hammer to make the problem go away, but I also don't want surprising behavior by default. IMO, there isn't really a clean solution for this because, e.g., the problem might manifest in some dependency of the application that happens to use temporary files under the hood and I don't want every crate depending on tempfile to have to implement this work around.

Ideally windows AVs would be smarter and/or windows would provide AVs with the necessary hooks such that this wouldn't happen. But 🤷‍♂️.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants