function performance seems to be affected by a match arm which is not evaluated. #110764

alexshagiev · 2023-04-24T14:06:33Z

so I am very new to rust and perhaps missing a trick here but I figured I will raise the question here because I am puzzled by this behavior.

I have a function lets call it serialize which takes a slice of u8 bytes and writes it to the u8 fixed size array allocated on stack. Because the pre-allocated fixed size array can over flow I have to do a check in the function prior to writing if there are enough bytes available on that stack allocated array. In my bench mark I then check performance of this serialize function and I know that in the scenario I had setup there is always sufficient room to write into. However, I have to do a mach to test for this and it seems that some of the code in the match arm which is never executed still gets evaluated.

Here is the actual function, so essentially it seem that the format! block of the first match arm is affecting performance characteristics of the function. If I for example simply add panic! on top of Result::Err I get 2x performance improvement and the bench completes with out panicking. I get similar effect If I simply put a empty string into the format! macro like so format!("") what am I doing wrong?

Is format! macro gets expanded into something outside of the match arm? and always runs regardless of the condition ?

    pub fn serialize_bytes(&mut self, bytes: &[u8]) -> Result<&mut Self> {
        let input_len = bytes.len();
        let avail = self.avail();
        match input_len > avail {
            true => {
                // panic!("blah"); // TODO this also seems to improve the bench mark which is technically never evaluating panic!
                Result::Err(SerDeError {
                    message: format!(
                        "adding {input_len} bytes, {avail} slots available. input: {bytes:?} buffer: {self:x}",
                    ),
                })
            }
            false => {
                // T.serialize(&serializer) - reuse serializer each iter
                //         time:   [62.164 ns 64.446 ns 67.565 ns]
                // self.bytes[self.len..self.len+input_len].copy_from_slice(&bytes);
                // 50% improvement using unsafe
                // T.serialize(&serializer) - reuse serializer each iter
                //         time:   [29.611 ns 30.166 ns 30.881 ns]
                unsafe {
                    std::ptr::copy_nonoverlapping(
                        bytes.as_ptr(),
                        self.bytes.as_mut_ptr().add(self.len),
                        bytes.len(),
                    );
                }

                self.len += bytes.len();
                Result::Ok(self)
            }
        }
    }

bench mark results with format! of blank string even thought this march arm is never executed. performance is similar to when simply adding panic! on top of the Result::Err(...

                Result::Err(SerDeError {
                    message: format!(
                        "",
                    ),
                })

to_serializer           time:   [14.396 ns 15.042 ns 15.750 ns]
                        change: [-65.368% -63.455% -61.474%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

bench mark results with format! that has to expand a number of variables, however this match arm is not evaluated under the test scenario


                Result::Err(SerDeError {
                    message: format!(
                        "adding {input_len} bytes, {avail} slots available. input: {bytes:?} buffer: {self:x}",
                    ),
                })

to_serializer           time:   [34.094 ns 34.780 ns 35.582 ns]
                        change: [+137.94% +149.42% +161.38%] (p = 0.00 < 0.05)
                        Performance has regressed.

The text was updated successfully, but these errors were encountered:

alexshagiev · 2023-04-24T14:57:34Z

is this caused by #110737

nikic · 2023-04-24T19:11:23Z

This is probably because format captures the pointers. You can confirm by passing copies of the variables to format.

alexshagiev · 2023-04-25T14:48:14Z

I am not sure how cloning data into error message vs just letting the format capture the pointers shall improve performance of the match arm which is never executed but here are the results which I still find puzzling.

Essentially I get three different and progressively worse performance results by changing a match arm of the code that is never executed, how does this make sense? What am I missing about format! pointers?

I run same bench mark in each case and only changed the format! of the match arm that never executes.
Q: Why am I certain that this match arm never executes?
A: Because I can put panic! onto of this match arm and the bench still runs successfully.

Q: What is the performance with panic! macro on?
A: Same as with format! and a blank string.

format!("") - empty string

                      format!(
                      ""
                      )

Result

to_serializer           time:   [12.861 ns 13.779 ns 15.272 ns]
                        change: [-43.005% -40.787% -37.849%] (p = 0.00 < 0.05)
                        Performance has improved.

format!("...." , ....) - with cloned values

                        format!(
                        "adding {in_len} bytes, {in_avail} slots available. input: {in_bytes:?} byffer: {in_self:x}",
                        in_len = input_len.clone(),
                        in_avail = avail.clone(),
                        in_bytes = bytes.clone(),
                        in_self = self.clone(),
                        )

Result

to_serializer           time:   [21.707 ns 21.958 ns 22.268 ns]
                        change: [+63.267% +72.362% +79.658%] (p = 0.00 < 0.05)
                        Performance has regressed.

format!("...", original) - with pointers to original variables

                      format!(
                      "adding {input_len} bytes, {avail} slots available. input: {bytes:?} buffer: {self:x}",
                      )

Result

to_serializer           time:   [35.993 ns 38.491 ns 41.498 ns]
                        change: [+64.439% +74.363% +85.315%] (p = 0.00 < 0.05)
                        Performance has regressed.

alexshagiev changed the title ~~function performance seems to be affected by a by a match arm which is not evaluated.~~ function performance seems to be affected by a match arm which is not evaluated. Apr 24, 2023

jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. C-discussion Category: Discussion or questions that doesn't represent real issues. labels Apr 25, 2023

alexshagiev mentioned this issue May 1, 2023

30% performance hit due to changing of the code in the non evaluating match arm #111075

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

function performance seems to be affected by a match arm which is not evaluated. #110764

function performance seems to be affected by a match arm which is not evaluated. #110764

alexshagiev commented Apr 24, 2023 •

edited

Loading

alexshagiev commented Apr 24, 2023

nikic commented Apr 24, 2023 •

edited

Loading

alexshagiev commented Apr 25, 2023 •

edited

Loading

function performance seems to be affected by a match arm which is not evaluated. #110764

function performance seems to be affected by a match arm which is not evaluated. #110764

Comments

alexshagiev commented Apr 24, 2023 • edited Loading

alexshagiev commented Apr 24, 2023

nikic commented Apr 24, 2023 • edited Loading

alexshagiev commented Apr 25, 2023 • edited Loading

alexshagiev commented Apr 24, 2023 •

edited

Loading

nikic commented Apr 24, 2023 •

edited

Loading

alexshagiev commented Apr 25, 2023 •

edited

Loading