Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function performance seems to be affected by a match arm which is not evaluated. #110764

Open
alexshagiev opened this issue Apr 24, 2023 · 3 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@alexshagiev
Copy link

alexshagiev commented Apr 24, 2023

so I am very new to rust and perhaps missing a trick here but I figured I will raise the question here because I am puzzled by this behavior.

I have a function lets call it serialize which takes a slice of u8 bytes and writes it to the u8 fixed size array allocated on stack. Because the pre-allocated fixed size array can over flow I have to do a check in the function prior to writing if there are enough bytes available on that stack allocated array. In my bench mark I then check performance of this serialize function and I know that in the scenario I had setup there is always sufficient room to write into. However, I have to do a mach to test for this and it seems that some of the code in the match arm which is never executed still gets evaluated.

Here is the actual function, so essentially it seem that the format! block of the first match arm is affecting performance characteristics of the function. If I for example simply add panic! on top of Result::Err I get 2x performance improvement and the bench completes with out panicking. I get similar effect If I simply put a empty string into the format! macro like so format!("") what am I doing wrong?

Is format! macro gets expanded into something outside of the match arm? and always runs regardless of the condition ?

    pub fn serialize_bytes(&mut self, bytes: &[u8]) -> Result<&mut Self> {
        let input_len = bytes.len();
        let avail = self.avail();
        match input_len > avail {
            true => {
                // panic!("blah"); // TODO this also seems to improve the bench mark which is technically never evaluating panic!
                Result::Err(SerDeError {
                    message: format!(
                        "adding {input_len} bytes, {avail} slots available. input: {bytes:?} buffer: {self:x}",
                    ),
                })
            }
            false => {
                // T.serialize(&serializer) - reuse serializer each iter
                //         time:   [62.164 ns 64.446 ns 67.565 ns]
                // self.bytes[self.len..self.len+input_len].copy_from_slice(&bytes);
                // 50% improvement using unsafe
                // T.serialize(&serializer) - reuse serializer each iter
                //         time:   [29.611 ns 30.166 ns 30.881 ns]
                unsafe {
                    std::ptr::copy_nonoverlapping(
                        bytes.as_ptr(),
                        self.bytes.as_mut_ptr().add(self.len),
                        bytes.len(),
                    );
                }

                self.len += bytes.len();
                Result::Ok(self)
            }
        }
    }
  • bench mark results with format! of blank string even thought this march arm is never executed. performance is similar to when simply adding panic! on top of the Result::Err(...
                Result::Err(SerDeError {
                    message: format!(
                        "",
                    ),
                })

to_serializer           time:   [14.396 ns 15.042 ns 15.750 ns]
                        change: [-65.368% -63.455% -61.474%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
  • bench mark results with format! that has to expand a number of variables, however this match arm is not evaluated under the test scenario

                Result::Err(SerDeError {
                    message: format!(
                        "adding {input_len} bytes, {avail} slots available. input: {bytes:?} buffer: {self:x}",
                    ),
                })

to_serializer           time:   [34.094 ns 34.780 ns 35.582 ns]
                        change: [+137.94% +149.42% +161.38%] (p = 0.00 < 0.05)
                        Performance has regressed.
@alexshagiev alexshagiev changed the title function performance seems to be affected by a by a match arm which is not evaluated. function performance seems to be affected by a match arm which is not evaluated. Apr 24, 2023
@alexshagiev
Copy link
Author

is this caused by #110737

@nikic
Copy link
Contributor

nikic commented Apr 24, 2023

This is probably because format captures the pointers. You can confirm by passing copies of the variables to format.

@jyn514 jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. C-discussion Category: Discussion or questions that doesn't represent real issues. labels Apr 25, 2023
@alexshagiev
Copy link
Author

alexshagiev commented Apr 25, 2023

I am not sure how cloning data into error message vs just letting the format capture the pointers shall improve performance of the match arm which is never executed but here are the results which I still find puzzling.

Essentially I get three different and progressively worse performance results by changing a match arm of the code that is never executed, how does this make sense? What am I missing about format! pointers?

I run same bench mark in each case and only changed the format! of the match arm that never executes.
Q: Why am I certain that this match arm never executes?
A: Because I can put panic! onto of this match arm and the bench still runs successfully.

Q: What is the performance with panic! macro on?
A: Same as with format! and a blank string.

  • format!("") - empty string
                      format!(
                      ""
                      )

Result

to_serializer           time:   [12.861 ns 13.779 ns 15.272 ns]
                        change: [-43.005% -40.787% -37.849%] (p = 0.00 < 0.05)
                        Performance has improved.
  • format!("...." , ....) - with cloned values
                        format!(
                        "adding {in_len} bytes, {in_avail} slots available. input: {in_bytes:?} byffer: {in_self:x}",
                        in_len = input_len.clone(),
                        in_avail = avail.clone(),
                        in_bytes = bytes.clone(),
                        in_self = self.clone(),
                        )

Result

to_serializer           time:   [21.707 ns 21.958 ns 22.268 ns]
                        change: [+63.267% +72.362% +79.658%] (p = 0.00 < 0.05)
                        Performance has regressed.
  • format!("...", original) - with pointers to original variables
                      format!(
                      "adding {input_len} bytes, {avail} slots available. input: {bytes:?} buffer: {self:x}",
                      )

Result

to_serializer           time:   [35.993 ns 38.491 ns 41.498 ns]
                        change: [+64.439% +74.363% +85.315%] (p = 0.00 < 0.05)
                        Performance has regressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants