Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path::is_trivial #487

Open
lolbinarycat opened this issue Nov 15, 2024 · 8 comments
Open

Path::is_trivial #487

lolbinarycat opened this issue Nov 15, 2024 · 8 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@lolbinarycat
Copy link

Proposal

Problem statement

The standard library currently contains functions that distinguish between relative and absolute paths, but sometimes it is desirable to only accept an even more restricted subset of paths.

Motivating examples or use cases

Programs that do filesystem traversal within an inner loop may want to reuse a buffer to store paths in, but this can lead to logic errors if those paths are complex, such as C:something or ../foo.

This could also be useful for programs that want to validate a configuration file or path-like argument, for example git would want to reject branch names containing .., or that would be stored on another drive.

Solution sketch

impl Path {
  /// Tests if joining this path with another path will simply append the two.
  ///
  /// Some things that would prevent a path from being counted as trivial:
  /// * being an absolute path
  /// * windows drive letters
  /// * any `..` section
  /// * windows verbatim paths
  pub fn is_trivial(&self) -> bool;
}

Alternatives

  • add a plethora of fine-grained methods for various properties, such as has_drive_letter and is_verbatim.

Links and related work

What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

Possible responses

The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):

  • We think this problem seems worth solving, and the standard library might be the right place to solve it.
  • We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

  • We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
  • We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.
@lolbinarycat lolbinarycat added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Nov 15, 2024
@kennytm
Copy link
Member

kennytm commented Nov 16, 2024

Is this definition of "trivial" used anywhere else? I think it may be better to break it down into two functions that has clearer definitions

  1. The path is not absolute, has no drive letter, etc (exists today already as !path.has_root())
  2. The path contains no .. interior (based on Add normalize_lexically to Path #396 I'd call it is_normalized_lexically()):
fn is_trivial(path: &Path) -> bool {
    !path.has_root() && path.is_normalized_lexically()
}

Questions: are these considered trivial / normalized lexically?

  1. dir/
  2. ./file
  3. dir//////file
  4. dir/././././file
  5. .
  6. the empty path ""

for example git would want to reject branch names containing .., or that would be stored on another drive.

for this motivation the proposed function based on Path may be inappropriate when portability is considered, for instance on Unix you a path x\..\y is definitely "trivial", but a git repo with such branch name may spell trouble for programs relying on is_trivial() when run on Windows.

@ChrisDenton
Copy link
Member

The path is not absolute, has no drive letter, etc (exists today already as !path.has_root())

That will return true for both file.txt and for C:file.txt paths. I think a proper function would be more like:

fn has_windows_legacy_prefix(path: &Path) -> bool {
    !path.is_absolute() && matches!(path.components().next(), Some(Component::Prefix(_)))
}

Then !has_windows_legacy_prefix would work.

@kennytm
Copy link
Member

kennytm commented Nov 16, 2024

@ChrisDenton Ok :(

For the first test I think what we are interested in is whether the path is "completely relative". In Python terms meaning the path has no "anchor"

impl Path {
    pub fn is_anchored(&self) -> bool {
        matches!(self.components().next(), Some(Component::Prefix(_) | Component::RootDir)))
    }
}

@ChrisDenton
Copy link
Member

I would propose a RelativePath type for that rather then relying on checking. We could have a path that works the same cross-platform (give or take path separators perhaps) and not have any of the funky platform specific functions (or if we do then only as platform-specific extensions).

@lolbinarycat
Copy link
Author

I would propose a RelativePath type for that rather then relying on checking.

you still need to have the checking if you ever want to take it from user input.

also, c:foo is a trivial path on linux but not on windows.

@ChrisDenton
Copy link
Member

you still need to have the checking if you ever want to take it from user input.

Yes, you'd check once and be done rather than every function dealing with paths having to act defensively (and not forget!).

also, c:foo is a trivial path on linux but not on windows.

If a RelativePath cannot be resolved relatively then it's up to filesystem functions to handle platform-specific issues (e.g. c:foo could be converted to .\c:foo on Windows). Alternatively a conversion to Path could simply error.

@joshtriplett
Copy link
Member

joshtriplett commented Nov 19, 2024

We discussed this in today's @rust-lang/libs-api meeting. We all agreed that we want a method with this behavior. We spent a while attempting to bikeshed naming.

Rough summary of the bikeshedding:

  • Oh no. Paths.
  • We didn't want something like "trivial" or "simple" that wasn't self-explanatory.
  • We considered something like can_append_beneath (and a function append_beneath that actually does the append and errors out if it would ascend). One problem with names like can_append_beneath is that they sound like they take an argument ("append beneath what?").
  • We considered something like is_descending, or is_descending_or_self or is_nonascending if we really want to nitpick that . isn't "descending".
  • We observed that technically paths like c:filename aren't ascending or descending, they're lateral moves.
  • Oh no. Paths.

We also talked about whether this should allow things like a/../b, and discussed whether any system might potentially turn that into a traversal if for instance a is a symlink. Then again, if a is a symlink, a/b is also a traversal.

Ultimately, something like Linux's openat2 with RESOLVE_BENEATH is needed to handle this perfectly. But there's still value in a lexical check.

We didn't come to any consensus, and we'd welcome input.

@traviscross
Copy link

Another possibility that came to mind: extends_only (or only_extends).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

5 participants