Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace mem.page_size with mem.min_page_size and mem.max_page_size #3815

Closed
wants to merge 4 commits into from

Conversation

daurnimator
Copy link
Contributor

@daurnimator daurnimator commented Dec 1, 2019

This PR removes mem.page_size and replaces it with minimums and maximums. Originally I attempted to create a global variable page_size, but I realised that this would not work when zig's own startup routines are not used (e.g. when zig is used to create a static library). There is no linux syscall to get the current page size if you don't have access to the auxiliary vector, so a hacky solution would be required (e.g. using the knowledge that madvise returns EINVAL on non-page aligned arguments, and doing a search for the page size)

This PR introduces mem.bufsiz as a reasonable "default size" for buffers. The name is taken from the posix constant BUFSIZ.

Additionally, on linux we now assert that the page size in the auxiliary vector matches expectations.

Closes #2564

@daurnimator daurnimator added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library. labels Dec 1, 2019
.aarch64, .aarch64_be => 1 * 1024 * 1024 * 1024, // ARM64 supports 4K, 16K, 64K, 2M, 32M, 512M, 1G pages
.mips, .mipsel, .mips64, .mips64el => 64 * 1024, // Every MIPS III, MIPS IV, MIPS32 and MIPS64 processor supports 4K, 16K and 64K page sizes.
.powerpc64, .powerpc64le => 64 * 1024,
.s390x => 2 * 1024 * 1024 * 1024, // 4K, 1M, 2G pages
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun fact: this doesn't fit in a u29!

.wasm32, .wasm64 => 64 * 1024,
else => 4 * 1024,
};

pub const max_page_size = switch (builtin.arch) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you justify the inclusion of this constant? The one place you used it, I disagree with. That test should be using actual page size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "actual page size"?

@@ -139,7 +139,7 @@ pub fn updateFileMode(source_path: []const u8, dest_path: []const u8, mode: ?Fil

const in_stream = &src_file.inStream().stream;

var buf: [mem.page_size * 6]u8 = undefined;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this * 6... it seemed oddly arbitrary.

@kyle-github
Copy link

Hopefully this is the right place for this question. If I am missing the intent of the patch, please ignore this comment.

CPU hardware does not generally support arbitrary page sizes. Hardware supports a very limited set of sizes (as you note in one of the comments in the patch). Rather than mix/max would it not be closer to reality to have arrays containing the valid page sizes for a given CPU? E.g.. 4k, 16k, 64k, 2M, 1G for some fictitious processor.

This intersects with the support in the OS. For instance, as far as I know most Linux systems on AMD64 support 4k and one or more of 2M and 1G (if the CPU supports it). So perhaps the array needs to be based on both the OS and CPU. IIRC iOS uses 16k but I do not remember if it supports any kind of huge pages.

Then if you want to have suggestions of buffer sizes, you can check that array and find the next largest page size than the initial ask. I.e. if the user code wants 12k, suggest 16k on many processors. Or look for multiples of a smaller page size and use 3x 4k pages.

@daurnimator
Copy link
Contributor Author

Rather than mix/max would it not be closer to reality to have arrays containing the valid page sizes for a given CPU? E.g.. 4k, 16k, 64k, 2M, 1G for some fictitious processor.

I don't think anyone cares about those details. Really the important one is the minimum page size, which is used when APIs (like mprotect) work on a page at a time. I've struggled to come up with a good use case for knowing the maximum page size.

Then if you want to have suggestions of buffer sizes, you can check that array and find the next largest page size than the initial ask. I.e. if the user code wants 12k, suggest 16k on many processors. Or look for multiples of a smaller page size and use 3x 4k pages.

bufsiz is used when you don't really have a minimum size in mind, it's a very general approximation where you should weigh up page size, cache line size, PIPE_BUF (the amount of data that atomically fits in a pipe), disk-sector size and network packet size. It's usually used when you're e.g. receiving data from the network and writing it out to a file.

@kyle-github
Copy link

I don't think anyone cares about those details. Really the important one is the minimum page size, which is used when APIs (like mprotect) work on a page at a time. I've struggled to come up with a good use case for knowing the maximum page size.

For the latter (max page size): Databases. DBs use huge pages all over the place and do their own internal fragmentation. Page size is very important for those applications. I have seen some large language VMs use big pages too, but I think the current trend there is in the opposite direction (smaller pages).

And now that I am thinking about this, perhaps this should be in the std.os library rather than here because it is the OS that ultimately determines what page sizes are supported.

bufsiz is used when you don't really have a minimum size in mind, it's a very general approximation where you should weigh up page size, cache line size, PIPE_BUF (the amount of data that atomically fits in a pipe), disk-sector size and network packet size. It's usually used when you're e.g. receiving data from the network and writing it out to a file.

OK. Fair enough.

@andrewrk
Copy link
Member

andrewrk commented Jan 5, 2020

I'm not ready to decide one way or another on this, but I have converted it into a proposal here: #4082

@andrewrk andrewrk closed this Jan 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

design flaw: operating system page size is not always comptime known, but std lib assumes it is
3 participants