Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird textInput buffer handling #2

Open
mifopen opened this issue Jul 22, 2023 · 13 comments
Open

Weird textInput buffer handling #2

mifopen opened this issue Jul 22, 2023 · 13 comments

Comments

@mifopen
Copy link

mifopen commented Jul 22, 2023

Hey!
I have noticed some weird behaviour of zgui.textInput that I do not understand.
Let's say we have the following code written in C/C++:

static char text[1024] = "";
static ImGuiInputTextFlags flags = ImGuiInputTextFlags_EnterReturnsTrue;
if(ImGui::InputText("##source", text, IM_ARRAYSIZE(text), flags)) {
  fprintf(stdout, "%s\n", text);
}

and now we want to convert it into Zig using zig-gamedev/zgui lib.

const static = struct {
    var buf: [128]u8 = undefined;
};
if (zgui.inputText(
    "##source",
    .{
        .buf = &static.buf,
        .flags = .{
            .enter_returns_true = true,
        },
    },
)) {
    std.debug.print("{s}\n", .{&static.buf});
}

Now if you do this:

  1. Enter "hello"
  2. Press Enter
  3. See "hello" in terminal
  4. Focus input again
  5. ctrl+a / cmd+a (select text)
  6. Type 'x'
  7. Press Enter

For the C variant, you will see x in the terminal as expected
For the Zig variant, you will see xllo in the terminal.

It could be definitely that I don't understand some very basic stuff about system programming/C/Zig as I'm just entering this realm. Could you please help me understand what is going on here?

@hazeycode
Copy link
Member

hazeycode commented Jul 23, 2023

Hey @mifopen, this is a good report!

It looks like Dear ImGui only writes a single terminal \0 after the string, as opposed to zeroing the entire buffer. In C/C++, because there is no "default" bounds checking and because strings are assumed to be null-terminated, this behavior makes sense.

Note that in your C++ code, the print statement null-terminates the string. But in your Zig code, you are printing the entire buffer, nulls included. Instead, you could try using std.mem.sliceTo or casting to a null-terminated pointer type.

You could also zero the entire buffer before calling inputText. But this would only disguise the real problem. Which is that you need to use a null-terminated pointer/slice type or specify a slice with a length equal to the offset of the first null character.

This is a general problem, where there is a trade-off between making C API bindings more Zig friendly and staying close to the original. I'll defer to @michal-z on what the preferred API design is here.

@mifopen
Copy link
Author

mifopen commented Jul 23, 2023

Thank you @hazeycode! This makes a lot of sense. I agree that it is not obvious how thin should Zig binding libs be in general. I will close the issue then.

@mifopen mifopen closed this as completed Jul 23, 2023
@Pyrolistical
Copy link

Pyrolistical commented Apr 4, 2024

There are 2 different bugs interacting here.

  1. var buf: [128]u8 = undefined; is being incorrectly zeroed. I think it is caused by Debug mode fails to set undefined memory to 0xaa for const/static var ziglang/zig#19538
  2. zgui.inputText incorrectly takes in a buf: []u8, when it should be a buf: [:0]u8.

imgui uses the total buf length to know what is the maximum text size, but uses a null sentinel to know what is the current text length. This is exactly what [:0]u8 means.

I'll get a PR up to fix the later

Pyrolistical referenced this issue in Pyrolistical/zig-gamedev Apr 4, 2024
The slice length is for maximum text length, whereas the sentinel
denotes the current text length.

Added output of current text length, which would have errored before
this change since the buffers didn't include a sentinel in their type.

See https://github.com/zig-gamedev/zig-gamedev/issues/352#issuecomment-2038175998
@michaelbartnett
Copy link

michaelbartnett commented May 15, 2024

@Pyrolistical I don't think changing the type of buf from []u8 to [:0]u8 accomplishes anything. The core problem is what @hazeycode pointed out: the buf being passed is storage, not the subspan of the storage that's considered to be the printable part of the string, so you need to sliceTo on the buffer afterward to get the printable slice.

You can reproduce the exact same behavior that @mifopen described with the same setup. All this does is ensure that at the end of the storage there's a null terminator. Maybe that's desirable, but it also means that code which passes the result of ArrayList(u8).allocatedSlice() as the buffer parameter needs to cast to a sentinel-terminated slice, which may trip safety checks unless you take care to write a 0 at the end of storage every time after you possibly grow the arraylist storage.

Run this test:

var input_buffer: [8:0]u8 = .{0} ** 8;

fn draw() void {
    const changed = zgui.inputText("Input text", .{
        .buf = &self.input_buffer,
        .flags = .{ .enter_returns_true = true },
    });
    if (changed) {
        std.log.info("Input raw array: {s}", .{self.input_buffer});
        std.log.info("Input sliced-to: {s}", .{std.mem.sliceTo(&self.input_buffer, 0)});
    }
}

Then use shift+right to highlight the "678" then type "0", and then hit enter. This will print:

info: Input raw array: 123450?8
info: Input sliced-to: 123450

Maybe it makes sense to add a more ergonomic API that takes an arraylist? This is what I had in my code:

fn inputTextArrayList(
    label: [:0]const u8,
    buffer: *std.ArrayList(u8),
    flags: zgui.InputTextFlags,
) !bool {
    const add_nullterm = buffer.items.len == 0 or buffer.items[buffer.items.len - 1] != 0;

    if (add_nullterm) {
        try buffer.append(0);
    }

    var use_flags = flags;
    use_flags.callback_resize = true;

    const InputResizeCallback = struct {
        fn callback(data: *zgui.InputTextCallbackData) i32 {
            std.debug.assert(data.event_flag.callback_resize);
            var b: *std.ArrayList(u8) = @alignCast(@ptrCast(data.user_data));

            b.ensureTotalCapacity(@intCast(data.buf_size)) catch @panic("OOM");
            b.items.len = @intCast(data.buf_text_len + 1);
            data.buf = b.items.ptr;
            data.buf_size = @intCast(b.capacity);

            return 0;
        }
    };

    const changed = zgui.inputText(label, .{
        .buf = buffer.allocatedSlice(), // doesn't compile now since this doesn't return a sentinel slice
        .flags = use_flags,
        .user_data = buffer,
        .callback = InputResizeCallback.callback,
    });

    if (add_nullterm) {
        std.debug.assert(buffer.items[buffer.items.len - 1] == 0);
        _ = buffer.pop();
    }
    return changed;
}

Or maybe a wrapper can return a struct with both a changed field and the new slice of strlen.

@Pyrolistical
Copy link

Pyrolistical commented May 15, 2024

[:0]u8 is absolutely needed. A buffer overflow will occur otherwise.

See for yourself. We can restore old []u8 code by adding this to gui.zig:

pub fn inputText2(label: [:0]const u8, args: struct {
    buf: []u8,
    flags: InputTextFlags = .{},
    callback: ?InputTextCallback = null,
    user_data: ?*anyopaque = null,
}) bool {
    return zguiInputText(
        label,
        args.buf.ptr,
        args.buf.len,
        args.flags,
        if (args.callback) |cb| cb else null,
        args.user_data,
    );
}

Then apply this patch:

iff --git a/samples/gui_test_wgpu/src/gui_test_wgpu.zig b/samples/gui_test_wgpu/src/gui_test_wgpu.zig
index b5161488..61f51d1f 100644
--- a/samples/gui_test_wgpu/src/gui_test_wgpu.zig
+++ b/samples/gui_test_wgpu/src/gui_test_wgpu.zig
@@ -483,7 +483,8 @@ fn update(demo: *DemoState) !void {

         if (zgui.collapsingHeader("Widgets: Input with Keyboard", .{})) {
             const static = struct {
-                var input_text_buf = [_:0]u8{0} ** 4;
+                var input_text_buf = [_:0]u8{ 'h', 'e', 'l', 'l', 'o' };
+                var input_text_buf2 = [_]u8{ 'h', 'e', 'l', 'l', 'o' };
                 var input_text_multiline_buf = [_:0]u8{0} ** 4;
                 var input_text_with_hint_buf = [_:0]u8{0} ** 4;
                 var v1: f32 = 0;
@@ -501,6 +502,7 @@ fn update(demo: *DemoState) !void {
             zgui.separatorText("static input text");
             _ = zgui.inputText("Input text", .{ .buf = static.input_text_buf[0..] });
             _ = zgui.text("length of Input text {}", .{std.mem.len(@as([*:0]u8, static.input_text_buf[0..]))});
+            _ = zgui.inputText2("Input text2", .{ .buf = static.input_text_buf2[0..] });

             _ = zgui.inputTextMultiline("Input text multiline", .{ .buf = static.input_text_multiline_buf[0..] });
             _ = zgui.text("length of Input text multiline {}", .{std.mem.len(@as([*:0]u8, static.input_text_multiline_buf[0..]))});

Run zig build gui_test_wgpu-run and see the buffer overflow.

Screenshot 2024-05-15 010520

@copygirl
Copy link

@Pyrolistical This only happens if you feed it an incorrect buffer that doesn't have the terminating NUL character. Any code that modifies the buffer should ensure that after the text typed, a NUL character is present. You're creating the buffer by hand, not ensuring it contains NUL, and then "complaining" that it reads past the buffer.

If you want your input to be able to contain 256 characters, you create a buffer with length of 257. Then you can modify that as needed, and a NUL should always be present after the valid text. You can't cram 257 valid characters into a buffer of length 257, which needs to contain the terminating NUL.

I haven't used zgui, but I assume the input text is supposed to allow you to modify the buffer by typing into it? Why would you especially craft a [_:0]u8{ 'h', 'e', 'l', 'l', 'o' } when you'd likely want to be able to type more than 5 characters?

@michaelbartnett
Copy link

michaelbartnett commented May 15, 2024

++ @copygirl

The example code was incorrect.

input_text_buf2 should be declared as a [_:0]u8 but inputText should still accept []u8, and it will work great because the [:0]u8 will coerce to []u8 with len + 1. That will solve the problem in that example.

And to cite @copygirl again, using a buffer that's exactly as long as your default input is abnormal usage of imgui. You normally over-provision if you're using fixed input storage.

If we wanna protect against overflow in cases where you forget to manually add a terminator or initially declare your storage as a sentinel array, the better move imo is to just write a 0 before calling the extern:

pub fn inputText2(label: [:0]const u8, args: struct {
    buf: []u8,
    flags: InputTextFlags = .{},
    callback: ?InputTextCallback = null,
    user_data: ?*anyopaque = null,
}) bool {
    buf[buf.len - 1] = 0; // <-------- prevent overflow
    return zguiInputText(
        label,
        args.buf.ptr,
        args.buf.len,
        args.flags,
        if (args.callback) |cb| cb else null,
        args.user_data,
    );
}

The end user will probably see their input is cutoff unexpectedly and this can be a clue to them that they need to use a sentinel array.

@hazeycode hazeycode reopened this May 15, 2024
@Pyrolistical
Copy link

Pyrolistical commented May 15, 2024

Any code that modifies the buffer should ensure that after the text typed, a NUL character is present.

If you want your input to be able to contain 256 characters, you create a buffer with length of 257. Then you can modify that as needed, and a NUL should always be present after the valid text.

Yes which is exactly what [:0]u8 means. It doesn't mean that null must be at len, a null could be present earlier for the true length of the string.

I still don't understand what problem is solved by allowing []u8.

The purpose of my example is to reveal of the bug when providing a filled buffer. Of course in real code the provided buffer would be much longer.

@michaelbartnett
Copy link

The problem is that it introduces footguns for people implementing the resize callback.

Previously you could pass your storage slice directly to the buf parameter (using an ArrayList here because it's common and useful for this):

if (arraylist.items.len == 0 or arraylist.items[arraylist.items.len - 1] != 0) {
    arraylist.append(0) catch @panic("OOM");
}
const slice = arraylist.allocatedSlice();
_ = zgui.inputText("InputLabel", .{
    .buf = @ptrCast(slice),
    .user_data = &arraylist,
    .flags = .{ .callback_resize = true },
    .callback = struct {
        fn resize(data: *zgui.InputTextCallbackData) i32 {
            var b: *std.ArrayList(u8) = @alignCast(@ptrCast(data.user_data));
            b.ensureTotalCapacity(@intCast(data.buf_size)) catch @panic("OOM");
            b.items.len = @intCast(data.buf_text_len);
            data.buf = b.items.ptr;
            data.buf_size = @intCast(b.capacity);
            return 0;
        }
    }.resize,
});
if (arraylist.items[arraylist.items.len - 1] == 0) {
    _ = arraylist.pop();
}

But now because of the sentinel terminated slice, you have to both ptrCast the slice, and also remember to take a subslice (len-1) so that the buf.len + 1 in zgui.inputText doesn't cause you to allocate on every keystroke and eventually OOM:

--- before.zig
+++ after.zig
if (arraylist.items.len == 0 or arraylist.items[arraylist.items.len - 1] != 0) {
    arraylist.append(0) catch @panic("OOM");
}
const slice = arraylist.allocatedSlice();
_ = zgui.inputText("InputLabel", .{
-     .buf = slice,
+     .buf = @ptrCast(slice[0 .. slice.len - 1]),
    .user_data = &arraylist,
    .flags = .{ .callback_resize = true },
    .callback = struct {
        fn resize(data: *zgui.InputTextCallbackData) i32 {
            var b: *std.ArrayList(u8) = @alignCast(@ptrCast(data.user_data));
            b.ensureTotalCapacity(@intCast(data.buf_size)) catch @panic("OOM");
            b.items.len = @intCast(data.buf_text_len);
            data.buf = b.items.ptr;
            data.buf_size = @intCast(b.capacity);
            return 0;
        }
    }.resize,
});
if (arraylist.items[arraylist.items.len - 1] == 0) {
    _ = arraylist.pop();
}

If you just ptrCast the slice and don't adjust the len, you'll OOM after 40 or 50 keystrokes because each time you're adding +1 to the buf len, which is normally a signal in this callback that a resize is required.

Also, the version with the slice length adjustment behaves correctly, but it still does not guarantee a null terminator at the end of the arraylist's allocated buffer. It's not a problem currently, but if runtime safety checks are ever introduced which check for that null terminator during ptrCasts (or someone tries to use a helper function for casting) then this will likely trip that check and panic.

So in addition to remembering to reduce the slice len before passing it in, you also ought to add a null terminator at buf.len - 1 so you're not lying to ptrCast.

Maybe all that would be fine if we were getting a big benefit from this, but I don't think we get much from it. It doesn't solve the original problem reported here, and the issue of the buffer overrun you showed never manfiests so long as your initial input has a null terminator, and this is easy to do:

var empty_storage = [1]u8{0} ** 32;
var storage_with_initial_text = ("initial input" ++ [1]u8{0} ** 32).*;

Or even if you wanna be lazy, your storage can still just be an undefined sentinel array and it will still work:

var input_storage: [32:0]u8 = undefined;

The reason it still works is because when you pass the slice of your sentinel array to zgui.inputText the coercion from [:0]u8 to []u8 increases len by 1, and now you're cruising.

The contract of the ImGui::InputText API is that you provide the current buffer with a null terminator somewhere but not necessarily at the end of the buffer. As long as you do that, imgui will not overrun the buffer. The [:0]u8 parameter type is expressing the wrong thing, and the right thing--having a null terminator somewhere in the input buffer but not necessarily at the end--is not expressable in Zig.

@Pyrolistical
Copy link

Pyrolistical commented May 16, 2024

From the zig docs

The syntax [:x]T is a slice which has a runtime-known length and also guarantees a sentinel value at the element indexed by the length. The type does not guarantee that there are no sentinel elements before that.

So it's not perfect but also not wrong

@michaelbartnett
Copy link

It's not broken or whatever, for sure, but the api contract of ImGui::InputText doesn't care about having a null terminator at the end of the input buffer. So the constraint on the parameter has limited utility.

And limited utility would be fine, except it also introduces this footgun with handling input buffer resizes. Feels like a net negative to me.

@Pyrolistical
Copy link

@michaelbartnett also you can just resize an array too:

diff --git a/samples/gui_test_wgpu/src/gui_test_wgpu.zig b/samples/gui_test_wgpu/src/gui_test_wgpu.zig
index b5161488..01c53315 100644
--- a/samples/gui_test_wgpu/src/gui_test_wgpu.zig
+++ b/samples/gui_test_wgpu/src/gui_test_wgpu.zig
@@ -21,6 +21,7 @@ const DemoState = struct {
     alloced_input_text_buf: [:0]u8,
     alloced_input_text_multiline_buf: [:0]u8,
     alloced_input_text_with_hint_buf: [:0]u8,
+    input_text_buf_resizing: [:0]u8,
 };
 var _te: *zgui.te.TestEngine = undefined;
 
@@ -140,10 +141,12 @@ fn create(allocator: std.mem.Allocator, window: *zglfw.Window) !*DemoState {
         .alloced_input_text_buf = try allocator.allocSentinel(u8, 4, 0),
         .alloced_input_text_multiline_buf = try allocator.allocSentinel(u8, 4, 0),
         .alloced_input_text_with_hint_buf = try allocator.allocSentinel(u8, 4, 0),
+        .input_text_buf_resizing = try allocator.allocSentinel(u8, 4, 0),
     };
     demo.alloced_input_text_buf[0] = 0;
     demo.alloced_input_text_multiline_buf[0] = 0;
     demo.alloced_input_text_with_hint_buf[0] = 0;
+    demo.input_text_buf_resizing[0] = 0;
 
     return demo;
 }
@@ -157,6 +160,7 @@ fn destroy(allocator: std.mem.Allocator, demo: *DemoState) void {
     allocator.free(demo.alloced_input_text_buf);
     allocator.free(demo.alloced_input_text_multiline_buf);
     allocator.free(demo.alloced_input_text_with_hint_buf);
+    allocator.free(demo.input_text_buf_resizing);
     allocator.destroy(demo);
 }
 
@@ -240,7 +244,7 @@ const NonExhaustiveEnum = enum(i32) {
     _,
 };
 
-fn update(demo: *DemoState) !void {
+fn update(demo: *DemoState, allocator: std.mem.Allocator) !void {
     zgui.backend.newFrame(
         demo.gctx.swapchain_descriptor.width,
         demo.gctx.swapchain_descriptor.height,
@@ -481,9 +485,29 @@ fn update(demo: *DemoState) !void {
             );
         }
 
+        const ResizingUserData = struct {
+            buffer: *[:0]u8,
+            allocator: std.mem.Allocator,
+        };
         if (zgui.collapsingHeader("Widgets: Input with Keyboard", .{})) {
             const static = struct {
                 var input_text_buf = [_:0]u8{0} ** 4;
+
+                fn resizeCallback(data: *zgui.InputTextCallbackData) i32 {
+                    var user_data: *ResizingUserData = @alignCast(@ptrCast(data.user_data.?));
+                    const input_text_buf_resizing = user_data.buffer.*;
+                    if (std.mem.len(@as([*:0]u8, @ptrCast(data.buf))) < input_text_buf_resizing.len) {
+                        return 0;
+                    }
+                    const larger = user_data.allocator.allocSentinel(u8, @intCast(data.buf_size * 2), 0) catch @panic("oom");
+                    @memcpy(larger[0..input_text_buf_resizing.len], input_text_buf_resizing);
+                    user_data.allocator.free(input_text_buf_resizing);
+
+                    user_data.buffer.* = larger;
+                    data.buf = larger.ptr;
+                    data.buf_size = @intCast(larger.len + 1);
+                    return 0;
+                }
                 var input_text_multiline_buf = [_:0]u8{0} ** 4;
                 var input_text_with_hint_buf = [_:0]u8{0} ** 4;
                 var v1: f32 = 0;
@@ -502,6 +526,9 @@ fn update(demo: *DemoState) !void {
             _ = zgui.inputText("Input text", .{ .buf = static.input_text_buf[0..] });
             _ = zgui.text("length of Input text {}", .{std.mem.len(@as([*:0]u8, static.input_text_buf[0..]))});
 
+            _ = zgui.inputText("Input text auto resize", .{ .buf = demo.input_text_buf_resizing, .flags = .{ .callback_resize = true }, .callback = static.resizeCallback, .user_data = @constCast(@ptrCast(&ResizingUserData{ .buffer = &demo.input_text_buf_resizing, .allocator = allocator })) });
+            _ = zgui.text("length of Input resizing text {}", .{std.mem.len(demo.input_text_buf_resizing.ptr)});
+
             _ = zgui.inputTextMultiline("Input text multiline", .{ .buf = static.input_text_multiline_buf[0..] });
             _ = zgui.text("length of Input text multiline {}", .{std.mem.len(@as([*:0]u8, static.input_text_multiline_buf[0..]))});
             _ = zgui.inputTextWithHint("Input text with hint", .{
@@ -743,7 +770,7 @@ pub fn main() !void {
 
     while (!window.shouldClose() and window.getKey(.escape) != .press) {
         zglfw.pollEvents();
-        try update(demo);
+        try update(demo, allocator);
         draw(demo);
     }
 }

@michaelbartnett
Copy link

michaelbartnett commented May 16, 2024

I mean, yeah, sure. but you can also do that independent of whether inputText's buf parameter has a sentinel. And you've just gone and implemented a subset of ArrayList. People will try what I wrote first, because it's simpler, and ArrayList is a pervasive vocabulary type in Zig.

I think this speaks to the need for an ArrayListSentinel in std, or a policy param on ArrayList enabling that 😛

I'd reach for managing buffer resizing directly if I cared about having a different growth policy then ArrayList for the input buffer. But that's rarely a thing I care about in imgui code.

To get at the heart of it, what the sentinel parameter really fixes is cases where people don't have any null terminators and imgui's internal strlen computes the wrong buffer size and reads past the end, right?

The simplest safeguard against this is to just write a null terminator before calling the extern. You don't get the overrun and you don't get the slice length footgun.

The original (mostly unrelated 😆) problem reported still happens, but that can be fixed with a different API for inputText, maybe something where it returns an optional slice of the actual input content.

pub fn inputTextFriendly(label: [:0]const u8, args: struct {
    buf: []u8,
    flags: InputTextFlags = .{},
    callback: ?InputTextCallback = null,
    user_data: ?*anyopaque = null,
}) ?[:0]u8 {
    if (buf.len > 0) {
        buf[buf.len - 1] = 0;
    }
    return if (zguiInputText(
        label,
        args.buf.ptr,
        args.buf.len,
        args.flags,
        if (args.callback) |cb| cb else null,
        args.user_data,
    ))
        std.mem.sliceTo(buf, 0)
    else
        null;
}

Then usage could be:

var storage: [32:0]u8; // sentinel slices are still okay, just overwrites the terminator that is already there
var input_content: []u8 = storage[0..0];

input_content = zgui.inputTextFriendly("Label", &storage, .{}) orelse input_content;

if (zgui.inputTextFriendly("Label", &storage, .{})) |input| {
    input_content = input;
    onInputChanged(input_content);
}

@hazeycode hazeycode transferred this issue from zig-gamedev/zig-gamedev Nov 5, 2024
@hazeycode hazeycode changed the title [zgui] Weird textInput buffer handling Weird textInput buffer handling Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants