Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.fs: Iterator.next doesn't handle ENOENT, but it can happen in valid scenarios #12211

Closed
squeek502 opened this issue Jul 24, 2022 · 7 comments · Fixed by #12226
Closed

std.fs: Iterator.next doesn't handle ENOENT, but it can happen in valid scenarios #12211

squeek502 opened this issue Jul 24, 2022 · 7 comments · Fixed by #12226
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@squeek502
Copy link
Collaborator

squeek502 commented Jul 24, 2022

Zig Version

master

Steps to Reproduce

getdents can return ENOENT if the directory referred to by the fd is deleted before the next getdents call (but after the fd is opened). Currently this causes an unexpected error to be returned from IterableDir.Iterator.next.

Relevant code for the Linux implementation of Iterator.next:

zig/lib/std/fs.zig

Lines 604 to 612 in a2ab9e3

const rc = linux.getdents64(self.dir.fd, &self.buf, self.buf.len);
switch (linux.getErrno(rc)) {
.SUCCESS => {},
.BADF => unreachable, // Dir is invalid or was opened without iteration ability
.FAULT => unreachable,
.NOTDIR => unreachable,
.INVAL => return error.Unexpected, // Linux may in some cases return EINVAL when reading /proc/$PID/net.
else => |err| return os.unexpectedErrno(err),
}

Contrived test case (note that I've run into this in a non-contrived use case, though):

const std = @import("std");

test "iterator ENOENT" {
    var tmp = std.testing.tmpDir(.{});
    defer tmp.cleanup();

    var iterable_subdir = try tmp.dir.makeOpenPathIterable("subdir", .{});

    var iterator = iterable_subdir.iterate();

    // Contrived reproduction, but this can happen outside of the program, in another thread, etc
    try tmp.dir.deleteTree("subdir");

    while (try iterator.next()) |_| {}
}

Expected Behavior

ENOENT to be handled in some way by Iterator.next (unsure in what way exactly, but it shouldn't be unreachable or an unexpected error).

Note: I've only tested this on Linux, unsure how this manifests on other platforms.

Actual Behavior

Test [1/1] test "iterator ENOENT"... unexpected errno: 2
/home/ryan/Programming/zig/zig/lib/std/debug.zig:560:19: 0x20ee18 in std.debug.writeCurrentStackTrace (test)
    while (it.next()) |return_address| {
                  ^
/home/ryan/Programming/zig/zig/lib/std/debug.zig:158:31: 0x20bbba in std.debug.dumpCurrentStackTrace (test)
        writeCurrentStackTrace(stderr, debug_info, detectTTYConfig(), start_addr) catch |err| {
                              ^
/home/ryan/Programming/zig/zig/lib/std/os.zig:5462:40: 0x20d7e2 in std.os.unexpectedErrno (test)
        std.debug.dumpCurrentStackTrace(null);
                                       ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:611:68: 0x20abbb in Iterator.next (test)
                            else => |err| return os.unexpectedErrno(err),
                                                                   ^
/home/ryan/Programming/zig/tmp/walkerdelete.zig:14:29: 0x2093bd in test "iterator ENOENT" (test)
    while (try iterator.next()) |_| {}
                            ^
/home/ryan/Programming/zig/zig/lib/test_runner.zig:79:28: 0x22ee1c in (root).main (test)
        } else test_fn.func();
                           ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:570:22: 0x22830c in std.start.callMain (test)
            root.main();
                     ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:514:12: 0x20defe in std.start.callMainWithArgs (test)
    return @call(.{ .modifier = .always_inline }, callMain, .{});
           ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:424:17: 0x20b286 in std.start.posixCallMainAndExit (test)
    std.os.exit(@call(.{ .modifier = .always_inline }, callMainWithArgs, .{ argc, argv, envp }));
                ^
/home/ryan/Programming/zig/zig/lib/std/start.zig:338:5: 0x20b092 in std.start._start (test)
    @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{});
    ^
Test [1/1] test "iterator ENOENT"... FAIL (Unexpected)
/home/ryan/Programming/zig/zig/lib/std/os.zig:2429:19: 0x228074 in std.os.unlinkatZ (test)
        .ISDIR => return error.IsDir,
                  ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:1750:25: 0x20fdf3 in std.fs.Dir.deleteFileZ (test)
            else => |e| return e,
                        ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:1731:13: 0x20d397 in std.fs.Dir.deleteFile (test)
            return self.deleteFileZ(&sub_path_c);
            ^
/home/ryan/Programming/zig/zig/lib/std/os.zig:5464:5: 0x20d7f1 in std.os.unexpectedErrno (test)
    return error.Unexpected;
    ^
/home/ryan/Programming/zig/zig/lib/std/fs.zig:611:43: 0x20abd2 in Iterator.next (test)
                            else => |err| return os.unexpectedErrno(err),
                                          ^
/home/ryan/Programming/zig/tmp/walkerdelete.zig:14:12: 0x2093ee in test "iterator ENOENT" (test)
    while (try iterator.next()) |_| {}
           ^
0 passed; 0 skipped; 1 failed.
@squeek502 squeek502 added the bug Observed behavior contradicts documented or intended behavior label Jul 24, 2022
@squeek502 squeek502 changed the title std.fs: Walker.next treats ENOENT as unexpected, but it can happen in valid scenarios std.fs: Iterator.next treats ENOENT as unexpected, but it can happen in valid scenarios Jul 24, 2022
@squeek502 squeek502 changed the title std.fs: Iterator.next treats ENOENT as unexpected, but it can happen in valid scenarios std.fs: Iterator.next doesn't handle ENOENT, but it can happen in valid scenarios Jul 24, 2022
@Vexu Vexu added the standard library This issue involves writing Zig code for the standard library. label Jul 24, 2022
@Vexu Vexu added this to the 0.11.0 milestone Jul 24, 2022
@squeek502
Copy link
Collaborator Author

squeek502 commented Jul 24, 2022

After a bit of investigation, I think a reasonable change here might be to just add

.NOENT => return null,

to the .linux implementation of Iterator.next.

Some reasoning:

  • ENOENT being returned seems to be Linux-specific; it is not a possible (or at least documented) return on FreeBSD, Mac, etc
  • I ran the test code via wine and got FileBusy from the deleteTree call, so it might not affect Windows in the same way as it might be harder/impossible to delete the directory while iterating it (haven't tested on a real Windows machine yet though)
  • I ran the test code on a FreeBSD VM and it passed without an error--that is, getdents returned 0 here which Iterator.next treats as end-of-iteration and returns null (EDIT: for the sake of accuracy, it actually returns the entries for . and .. [which get skipped] before returning 0)
  • I added .NOENT => return error.DirNotFound, to the Linux switch and DirNotFound to the IteratorError error set just to see what it'd be like, and found that in all of the (few) cases in the standard library that used Iterator.next, it made sense to turn the next call into something like:
iter.next() catch |err| switch (err) {
    error.DirNotFound => null,
    else => |e| return e,
}

which is exactly what returning null from within Iterator.next would do.

@ghost
Copy link

ghost commented Jul 24, 2022

I think your reasoning for returning null makes sense and should go in a PR.

squeek502 added a commit to squeek502/zig that referenced this issue Jul 24, 2022
…ENT`

`getdents` on Linux can return `ENOENT` if the directory referred to by the fd is deleted during iteration. Returning null when this happens makes sense because:

- `ENOENT` is specific to the Linux implementation of `getdents`
- On other platforms like FreeBSD, `getdents` returns `0` in this scenario, which is functionally equivalent to the `.NOENT => return null` handling on Linux
- In all the usage sites of `Iterator.next` throughout the standard library, translating `ENOENT` returned from `next` as null was the best way to handle it, so the use-case for handling the exact `ENOENT` scenario specifically may not exist to a relevant extent

Previously, ENOENT being returned would trigger `os.unexpectedErrno`.

Closes ziglang#12211
Vexu pushed a commit that referenced this issue Jul 25, 2022
… `ENOENT`

`getdents` on Linux can return `ENOENT` if the directory referred to by the fd is deleted during iteration. Returning null when this happens makes sense because:

- `ENOENT` is specific to the Linux implementation of `getdents`
- On other platforms like FreeBSD, `getdents` returns `0` in this scenario, which is functionally equivalent to the `.NOENT => return null` handling on Linux
- In all the usage sites of `Iterator.next` throughout the standard library, translating `ENOENT` returned from `next` as null was the best way to handle it, so the use-case for handling the exact `ENOENT` scenario specifically may not exist to a relevant extent

Previously, ENOENT being returned would trigger `os.unexpectedErrno`.

Closes #12211
@marler8997
Copy link
Contributor

If we return a DirNotFound error, then the application can always turn that error into a null, but if we return null, any application that would care whether the directory was removed can't do the reverse.

I can imagine applications that might care about this distinction. For example, say an application is reading config files from a directory. If that directory gets removed while it's iterating, rather than continuing on like it's finished with all the files, it may want to stop and assert an error. (I would want this behavior in my zigwin32 project that reads JSON files to generate the zig bindings).

It would also be problematic for programs that do want to handle this case differently to detect it afterwards. This would mean you'd always have to check whether the directory still exists after you're done iterating over it (an extra check for the "happy path") and this check will always be a race condition since the directory could have been removed and recreated. The only correct solution they would have is to use the lower-level APIs to iterate over the directory.

Actually now that I've thought about it, I would think that in most cases you'd probably want to assert an error if a directory is removed while you're iterating over it. In zig for example, if you're iterating over the files in a cache directory to copy, then you wouldn't want to continue like nothing happened if the source directory was suddenly removed.

@squeek502
Copy link
Collaborator Author

squeek502 commented Jul 25, 2022

@marler8997 I would normally agree with you, and had similar thoughts initially, but AFAIK the APIs in std.fs.Dir are meant to behave similarly on different platforms and AFAIK, detecting this condition can only happen on Linux/WASI. On non-Linux UNIX platforms, ENOENT is not a possible error, and getdents just returns 0 if the directory being iterated is deleted (only tested on FreeBSD, other platforms might behave differently).

So, if you wrote some code that handles DirNotFound in some particular way, you'd only get that behavior on Linux, and on non-Linux platforms you'd just get a silent end-of-iteration.

I figured that making Linux behave like non-Linux was the better route than trying to get non-Linux to behave like Linux (which would involve something like an exists check after any 0 return of getdents).

@marler8997
Copy link
Contributor

Some systems will have extra error codes that others do not. When this is the case, I think it's better to return the error when possible rather than ignore it by default. I believe this is what the rest of the std library does and I think it's the right choice in general.

Maybe you missed it, but I addressed the solution you suggested where you check whether the directory still exists afterwards. This solution is a race condition since the directory can be deleted/re-created. After I had thought on it, it seemed like most programs (or at least many) would want to handle the case where the directory is deleted as a non-happy path. With this API, all cases that want the "correct behavior" are forced to duplicate their directory traversal code to use the lower-level API for any systems that can detect this error and fallback to the higher level API otherwise.

@ominitay
Copy link
Contributor

I agree with marler: returning null is a potentially misleading result, and an error is more appropriate here.

@squeek502
Copy link
Collaborator Author

squeek502 commented Jul 26, 2022

I can see pros/cons to both approaches, and this general topic seems to be something that remains unresolved in terms of how the std.fs API should be designed. Some more relevant issues/discussions on the same theme:

I personally still lean towards returning null in this instance, though:

  • When writing 'generic' cross-platform code, it would essentially always be a bug to use try with Iterator.next. To be consistent across platforms you'd basically always have to catch/handle DirNotFound
  • Any use-case for handling DirNotFound without converting it to null would only actually function on Linux, which (imo) reduces the usefulness of the error

One potential compromise might be to split next into next and nextLinux as is done for the .macos, .ios, .freebsd, ... switch case of Iterator, where nextLinux would return DirNotFound, and then next converts it to null. That way, if you're on Linux, you could call nextLinux directly if you want to handle the error, but otherwise you could just call next to get consistent cross-platform behavior.

squeek502 added a commit to squeek502/zig that referenced this issue Jul 31, 2022
…latform-specific errors

Follow up to ziglang#12226, implements the compromise detailed in ziglang#12211 (comment)
Vexu pushed a commit that referenced this issue Aug 1, 2022
…latform-specific errors

Follow up to #12226, implements the compromise detailed in #12211 (comment)
@Vexu Vexu modified the milestones: 0.11.0, 0.10.0 Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants