Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filename character sequence with tilde sometimes causes false error on Linux #152

Open
akubot opened this issue Apr 23, 2021 · 6 comments
Open

Comments

@akubot
Copy link

akubot commented Apr 23, 2021

Observed on Linux, a failure at line 931 of bagit.py in git commit hash code bff20d2

RHEL 7.5 (Maipo), Linux 3.10.0-1160.15.2.el7.x86_64 GNU/Linux

Python 3.7.4 using miniconda3

File names containing this 4-character sequence cause an "is unsafe" false error due to expansion of the tilde:

~$_-

To reproduce, create a file name such as

data/blah/~$_-expect_fail.doc

In Python v3, do this

import os, sys
f = "data/blah/~$_-expect_fail.doc"

# Note that Linux home directory is inserted in expansion
os.path.expandvars(f) 
'data/blah/~/home/users/jdoe-expect_fail.doc'

A similar test with filename that does not have the hyphen after the underscore

data/blah/~$_expect_success.doc

works OK, i.e. without unwanted expansion of the tilde.

@acdha
Copy link
Member

acdha commented Apr 23, 2021

That's certainly quite odd — does it reproduce if you simply run python -c 'import os.path; os.path.expanduser("data/blah/~$_-expect_fail.doc")'? Looking at the stdlib implementation, my first reaction is surprise that this isn't failing the getpwnam() lookup:

https://github.com/python/cpython/blob/3.8/Lib/posixpath.py#L228-L271

@akubot
Copy link
Author

akubot commented Apr 23, 2021

I modified your test slightly, it confirms the problem.

I changed it to expandvars, which is the one failing.
and also wrapped it with a print()

The original command you posted does not throw an error but also doesn't print anything.

$ python3 -c 'import os.path; print(os.path.expandvars("data/blah/~$_-expect_fail.doc"))'
data/blah/~/home/users/jdoe-expect_fail.doc

I don't have an explanation for the issue, but somehow these 4 characters together expand on RHEL anyway.

@acdha
Copy link
Member

acdha commented Apr 23, 2021

Ah, I missed that and it explains the problem: $_ is a shell variable (last argument of the previous command, if memory serves), and that would explain why you're seeing this depending on the next character since it has to be something which stops parsing at the next character.

I'm wondering whether we want to remove that part of #88 on the theory that this is being overzealous. I do not want to be in the business of having to try to maintain a separate implementation of expandvars.

@akubot
Copy link
Author

akubot commented Apr 23, 2021

It looks like it's slightly more subtle -- the sequence "$_-" also causes the problem

$ python3 -c 'import os.path; print(os.path.expandvars("data/blah/$_-expect_fail.doc"))'
data/blah/~/home/users/jdoe-expect_fail.doc

while just "$_" does not

$ python3 -c 'import os.path; print(os.path.expandvars("data/blah/$_expect_success.doc"))'
data/blah/$_expect_success.doc

and similarly for just "~$_"

python3 -c 'import os.path; print(os.path.expandvars("data/blah/~$_expect_success.doc"))'
data/blah/~$_expect_success.doc

So it appears the hyphen helps trigger this. But that makes me think there are other cases lurking.

Regarding the bagit.py source code, I'm not sure what the correct choice would be. I suppose you could put some guard around the expandvars code, to check for the character sequences that mess it up on Linux?

But I'm not sure specifically what you would do, because you are currently matching a possibly expanded filename against the real one.

Probably better to discuss with regular contributors, and ask whether to back out the expandvars snippet completely? Then it should always pass on Linux.

BTW, a Windows user verified this issue does not occur there, which makes sense because this is a Linux expansion.

@acdha
Copy link
Member

acdha commented Apr 23, 2021

I was thinking that it might be good to discuss this with @runderwood to see whether we have a compelling reason to keep expandvars in here.

This variable is set by at least the popular Bourne shells and ones like Fish which try to be compatible so I suppose we could purge $_ at startup but that seems like it's just asking for some unexpected dependency to show up later.

@runderwood
Copy link
Contributor

I don't really recall why the expandvars call is there, to be honest.

While I can see the reasoning that it's safer to disallow potentially dangerous variable expansion, I believe it's system-dependent so you don't really enforce a meaningful universal policy doing it this way.

I don't have a compelling argument for keeping it there if it's causing problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants