Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.posix: Added 'error.ProcessNotFound' where necessary #23268

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

chrboesch
Copy link
Contributor

If you attempt to read (or other operations) from a process previously opened via openat, but it has since terminated, the Linux kernel throws the error ESRCH - No such process. This is now represented by error.ProcessNotFound.

fixes #19875

@alexrp
Copy link
Member

alexrp commented Mar 16, 2025

I'm not really clear on the conditions under which we get ENOENT vs ESRCH. Can you clarify?

@rootbeer
Copy link
Contributor

(Let me preface this comment by saying, do whatever Alex says, and feel free to ignore me.)

I think this PR correctly maps the errno code on read/write when operating on a /proc/<pid>/ path, where ESRCH means that the process that was previously backing a proc entry is no longer present. For example Linux's fs/proc/base.c:proc_pid_cmdline_read returns ESRCH in this case.

Also since POSIX defines ESRCH as "No such process" this mapping seems relatively stable (even if other kinds of handles like sockets or FUSE or whatnot besides /proc handles might return ESRCH.)

From what I can tell (in the proc/base.c code), ENOENT is used in the proc code when the race lookup is more filesystemy (i.e., during an open or readdir callback, not so much read/write). E.g., in fs/proc/base.c:proc_map_files_readdir. So Zig's conversion to ProcessNotFound may make sense when the open handle points into /proc, but ENOENT could happen in regular directories, too. It is not clear that ProcessNotFound would always be the right conversion.

Doing this mapping at read/write where you can't know what kind of handle is being used (e.g., if its a socket, a FUSE handle, or an open proc handle, etc), seems likely to cause confusion to me. (Also I don't see evidence of Linux using ENOENT in read/write paths in the main proc handlers, but I wouldn't be at all surprised if there are some out there somewhere that do ....)

I think Zig's posix layer can get away with specializing some errno from some syscalls. But I think it'll be hard to do that correctly from syscalls like read or write that can be used on so many different kinds of handles and where the semantics of a specific errno may be specific to the kind of handle.

My suggestion on this PR is that the ESRCH to ProcessNotFound works, but the exisiting mapping of ENOENT (which generally means "file or directory not found") to ProcessNotFound is not something Zig should be doing on syscalls like read and write.

@chrboesch
Copy link
Contributor Author

I generally agree with that. As suggested, "process not found" could of course return "file not found." I'm completely unenthusiastic about this and just want to avoid further confusion, because at that point, you're working with "processes" and not with files. Not from a technical point of view (everything is a file), but from a logical point of view. Personally, I'd stick with "process," because when you're working with it, you think of PID as a process and not a file.

@rootbeer
Copy link
Contributor

(Again, I'm not a Zig team member, so my opinions may be ignored and I may be off-base.) The problem I'm trying to convince you about is that the ENOENT errors are being mapped to error.ProcessNotFound in the std.posix.read (**) function and that function will be used for other things than /proc accesses. I agree "process not found" is nicer than "file not found" when manipulating proc entries, but I'm arguing that its a worse result for handles outside of proc, and thus not a good errno mapping for read to use.

Zig's std.posix.read function should map ENOENT to errors.FileNotFound because that is the direct mapping, and is used elsewhere in Zig's posix layer. Alternatively, in my understanding, the Zig standard practice would be that std.posix.read should not map ENOENT at all (and let it fall into the Unexpected case) because the read syscall should not return ENOENT. As ENOENT generally gets returned from open or getdirents, not read.

My comments here are all about the pre-existing mappings of ENOENT. This current PR just adds mappings from ESRCH to error.ProcessNotFound which seems fine. So technically I think this PR is fine. But I do think the existing errno mapping of ENOENT should get cleaned up. (I think they're relevant because they were added by a previous PR for the same issue.)

(**) The read here stands in for all of read/write/pread/pwrite/preadv/pwritev, they're all the same in terms of errno mapping and the not-found errnos.

@alexrp
Copy link
Member

alexrp commented Mar 24, 2025

I was under the impression that read & co can return ENOENT specifically for files under /proc (e.g. if the process dies between open and read), and that was the motivation for #21430?

@chrboesch
Copy link
Contributor Author

chrboesch commented Mar 24, 2025

Yes, and that hasn't changed. The error message returned by the Linux kernel varies depending on the combination of how a PID is accessed or can no longer be accessed because it no longer exists. We can now catch both possible error types and return them as 'process not found' in Zig, which logically is always the case. Or we can ignore the fact that it's accessing a process and simply write "file not found." I'm still in favor of option 1 and find option 2 clumsy because it's extremely confusing (as Linux often is, unfortunately). We have the option of outputting explanatory error messages, so why not use them?

Addendum: The Linux manual pages also say, "The proc filesystem is a pseudo-filesystem...", so I find the error message "procress not found" much more meaningful than "file not found".

@KilianHanich
Copy link

I actually have a question: What a specific situation/example where read returns ENOENT?

…tencies.

After extensive testing, the error messages for processes that are already open but no longer available for reading/writing have been adjusted to 'ProcessNotFound' if the error code ESRCH is thrown by the Linux kernel.
Incorrect ENOENT error codes that also returned 'ProcessNotFound' have been removed.
@chrboesch
Copy link
Contributor Author

To be honest, I don't remember. I had a test setup consisting of a C program and a Zig program, and I was getting corresponding error codes. Now I tried to recreate it, but to no avail. So I tested everything again from the beginning and only got ESRCH. I've now adjusted this PR accordingly.

@@ -1565,6 +1566,7 @@ fn openDirFlagsZ(self: Dir, sub_path_c: [*:0]const u8, flags: posix.O) OpenError
error.FileLocksNotSupported => unreachable, // locking folders is not supported
error.WouldBlock => unreachable, // can't happen for directories
error.FileBusy => unreachable, // can't happen for directories
error.ProcessNotFound => unreachable, // can't happen for directories
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if I try to open /proc/12345 (as opposed to a file under it)? Or a subdirectory somewhere under /proc/12345/...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you try const proc = try std.posix.open("/proc/29103", .{}, 0); and the PID is unavailable, you get .NOENT => return error.FileNotFound,.

However, if the PID was open correctly and the PID disapears, and you try const fd = try std.posix.openat(proc, "cpuset", .{}, 0);, then you get .SRCH => return error.ProcessNotFound,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then it sounds like this shouldn't be unreachable unless I'm misunderstanding?

Copy link
Contributor Author

@chrboesch chrboesch Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unreachable is the right choice here, because if the error is caught there, something else fundamentally wrong. Omitting it won't work; the compiler will complain about the dependencies. And the same goes for the other errors caught there with unreachable. The idea wasn't mine; I just added it because it makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow; ProcessNotFound is in OpenError, which this function returns. Your example of const fd = try std.posix.openat(proc, "cpuset", .{}, 0); where the process behind proc disappeared in the meantime also seems like a completely reasonable scenario where the programmer couldn't have done anything to prevent it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that can be prevented. So the only option is to catch the error using catch. I just don't quite understand what you're getting at.

@rootbeer
Copy link
Contributor

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESRCH is not handled by file io operations
4 participants