-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7280 alternative fix: wrap existing rmtree to retry longer only for windows access errors. #7363
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that something targeted like this is a better approach. Thanks for trying it out! 👍
We can clean things up a bit with the retry
wrapper's retry_on_exception
argument. We can also define this version of rmtree
(with that argument) on Windows only, which should alleviate some concerns about changing behavior on all platforms. e.g.
def rmtree_internal(...):
...
if sys.platform == "win32":
@retry(stop_max_attempt_number=3, retry_on_exception=is_windows_permission_error)
@retry(...) # same as before
def rmtree(...):
rmtree_internal(...)
else:
@retry(...) # same as before
def rmtree(...):
rmtree_internal(...)
For tests, I would add a test directly of is_win_access_error
(with a real error generated on Windows using a sub-process to hold some file open) and then a Windows and Unix-specific test that take a similar approach as before checking the duration. For right now I think we can do a straightforward update to the tests even though it increases the duration slightly (since the overall increase as a % is relatively small) and then make a note in #4497 to take a better approach in a followup (e.g. what I described in #7361 (comment)).
wow thanks for the advice. I'll try those approaches out later. |
@chrahunt i'm trying out the changes you suggested and it works. however the code feels a bit redundant. also there isn't a good place to log a warning that the rmtree was unable to delete files. if sys.platform == "win32":
def is_win_access_error(e):
"""returns True if the OSError is a Windows access error which indicates
that another process is preventing the object from being deleted."""
return (
isinstance(e, OSError) and
e.errno == errno.EACCES and
e.winerror == 5
)
@retry(stop_max_attempt_number=5, retry_on_exception=is_win_access_error)
@retry(stop_max_delay=3000, wait_fixed=500)
def rmtree(dir, ignore_errors=False):
rmtree_internal(dir, ignore_errors)
else:
@retry(stop_max_delay=3000, wait_fixed=500)
def rmtree(dir, ignore_errors=False):
rmtree_internal(dir, ignore_errors) |
For the code duplication, you can do something like @retry(stop_max_delay=3000, wait_fixed=500)
def rmtree_internal(dir, ignore_errors=False):
... Then
but I would not be surprised if that causes mypy to complain. Better to have the very minor duplication like you showed than a bunch of For tracing the exception, it isn't pretty but you could wrap |
@chrahunt based on the python27 error logs in the CI checks, it looks like double-decorating a function doesn't work in python27, or seems to be ignored. |
I installed python 2.7 and was able to reproduce the error with tox. |
I don't see anything obvious that would cause that. I would make sure the function is getting called and validate the argument and return value. I didn't see any open issues in the retrying project to explain it. |
speculation: retry might not support cascading decorators or we’re hitting an edge case? |
Not to jump ahead but if it does turn out to be a retrying bug then we may want to switch to tenacity per rholder/retrying#79. |
update the code to handle the different errors for locked files on windows between python 2.7 and python 3.
@chrahunt @pradyunsg the problem was that on python 2.7 the error raised is different, and also more difficult to fake in tests. no problems with the retry decoration. |
@chrahunt feel free to ask for other changes. like naming and what not. the problem i encountered would have been more obvious if I had followed your advice and created a test that only checks whether we can detect the error caused by file handle locking. But I am not quite sure how i'd coordinate with the other process. i guess i can try just forking and sleeping. |
Maybe something like:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added specific comments to the code, and have 2 general comments:
- (minor) for the docstrings I would capitalize the first letter for consistency with the rest of the module
- we should have a test like described above to make sure we're catching the actual error that would be raised
Also, do we know whether this is the only circumstance under which this error is raised, or is it also raised if we simply don't have permission? I don't think it's a deal-breaker but it would be good to know so we could potentially refine it in a followup.
@chrahunt I attempted to implement something like this but it didn't make the same errors that the virus scanners did. I think the errors are different if the competing process is privileged. when I have a child process hold a handle to the file, I get a different error which specifically points out that another process is using the file. When a virus scanner is holding it, I get an "Access denied" error. So I wasn't able to get the test to simulate a virus scanner. I didn't notice your other comments until just now. So I'll have to handle them later. |
Hey @choyrim thanks for looking at this super frustrating issue, from experience it is not fun to tackle. I know I have asked this question before but I can't remember the answer and whether I just never got around to implementing the fix -- was there a reason that you were aware why the TempDir implementation in pip doesn't take advantage of |
wow this weakref thing sounds like it could be more elegant. i will have to
read up on it and try it out. do you have an example i could imitate?
…On Sat, Nov 23, 2019 at 03:49 Dan Ryan ***@***.***> wrote:
Hey @choyrim <https://github.com/choyrim> thanks for looking at this
super frustrating issue, from experience it is not fun to tackle.
I know I have asked this question before but I can't remember the answer
and whether I just never got around to implementing the fix -- was there a
reason that you were aware why the TempDir implementation in pip doesn't
take advantage of weakref.finalize to help clean it up? I've run into
this issue a number of times and seen a number of cases of it coming up --
@pradyunsg <https://github.com/pradyunsg> @chrahunt
<https://github.com/chrahunt>
/cc @pfmoore <https://github.com/pfmoore> @zooba
<https://github.com/zooba>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7363?email_source=notifications&email_token=AAC42ZMUJU54TFZ26YXXX63QVDOA3A5CNFSM4JOEPB32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7P22Y#issuecomment-557776235>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC42ZOCCXBXUIAXAGSSCM3QVDOA3ANCNFSM4JOEPB3Q>
.
|
@techalchemy is that related to this fix? Sorry if I'm missing something. It would definitely be useful in #6982. |
Dan,
I read up on weakref.finalize but struggled to figure out how to use it.
are you suggesting that we use finalize instead of the exit handler to do
the rmtree?
…On Sat, Nov 23, 2019 at 02:49 Dan Ryan ***@***.***> wrote:
Hey @choyrim <https://github.com/choyrim> thanks for looking at this
super frustrating issue, from experience it is not fun to tackle.
I know I have asked this question before but I can't remember the answer
and whether I just never got around to implementing the fix -- was there a
reason that you were aware why the TempDir implementation in pip doesn't
take advantage of weakref.finalize to help clean it up? I've run into
this issue a number of times and seen a number of cases of it coming up --
@pradyunsg <https://github.com/pradyunsg> @chrahunt
<https://github.com/chrahunt>
/cc @pfmoore <https://github.com/pfmoore> @zooba
<https://github.com/zooba>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7363?email_source=notifications&email_token=AAC42ZMUJU54TFZ26YXXX63QVDOA3A5CNFSM4JOEPB32YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE7P22Y#issuecomment-557776235>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC42ZOCCXBXUIAXAGSSCM3QVDOA3ANCNFSM4JOEPB3Q>
.
|
I agree, I'm not clear how I guess what we could do is, after we do the normal deletion-with-retry (without the longer delay this PR introduces) if we still haven't managed to delete the directory after the retries, we could register a final attempt to delete the directory when pip exits ( The problem is that pip isn't that long-running, so the |
I would suggest that you also wait for Environment Error, OSError number 32, because I can easily reproduce that one in the real world. |
@chrahunt, in the spirit of making pull requests small. I volunteer to implement an integration test along the lines you indicate above. My experience suggests that will cause WinError 32, and then I'd add that only if that's what I got. |
@choyrim I'm going to continue to pursue a more localized change to In the meantime, this gist is intended to do the multiprocessing as suggested by @chrahunt, with a small difference. Instead of immediately opening the file, the I've also attempted to worry about the following errors that could occur:
FileOpener supports two additional actions, only one of which is implemented:
So far, I still get windows error 32 no matter what I do. |
since #7479 will also make this issue no longer a problem, perhaps I close this now? |
@choyrim Wouldn’t the problem still occur if the anti virus is still scanning when pip exits (and tries to delete the folder)? Although indeed it’d reduce the probability this happens. |
well i can't find the code that takes care of deleting the temp files. if pip will fail the install when deleting the folder, then the answer is yes. however, since the cylance scan seems to occur soon after the download starts, the probability of a conflict will be greatly reduced with the deletion running after the install. |
@choyrim Makes sense. pip will fail if it fails to delete the temp directory after installation, but indeed the possibility would be reduced. I guess we can always fix it when someone complains. |
Another approach to fixing #7280 - this variation wraps the existing
rmtree
with retries. However this wrapper will only retry if the error raised byshutils.rmtree
is a windows access error.