Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary heap allocated copies in Transformation actions #3231

Merged

Conversation

eduar-hte
Copy link
Contributor

@eduar-hte eduar-hte commented Aug 16, 2024

what

Remove unnecessary heap allocated copies in most Transformation classes.

why

The current implementation of most Transformation classes are copying the std::string values in order to modify them and then return them. This can be optimized by updating the signature of Transformation::evaluate to receive a reference of the value to be modified, and if the transformation allows it, modify it in-place.

This change increases performance of the benchmark test by about 11%-12,5%.

changes

  • Updated transformations to work inplace on the std::string buffer of the value where the transformation is being applied, instead of copying to a temporary (heap-allocated) string.
  • Updated Transformation::evaluate signature to receive the value by reference and perform the transformation inplace, if possible (some transformations cannot be performed in place because they need a bigger buffer to produce the new value, such as base64 encoding).
    • Additionally, the signature of the method has been updated to return a bool value to indicate if the value has been changed. This removes an additional string comparison in RuleWithActions::executeTransformation to compare the original and transformed value further improving performance.
      • This required many of the transformations to detect when a change is produced in the input value to produce this in an efficient way (look for the changed bool flag in their implementation). Some transformations always transform their input value, so they just return true. Notably, a few of the transformations had implementations that would compute a changed flag, but that was currently unused.
  • Made inplace transformations methods inline and moved them out of the class interface as almost all of them are not used outside of them.
  • Simplified & shared implementation of several transformations, removing duplicate code.
  • Removed unnecessarily included header files.

references

While working on updating the configuration to build on C++17 (and C++20), I found out through this comment that a PR to make the transformations in-place was submitted before in PR #2368 and was integrated into the v3.1 branch (though curiously the changes are not currently included in that branch).

This PR goes further in the review of the different in-place transformations to simplify and make their implementations more consistent and efficient. Additionally, because of the signature change of Transformation::evaluate included in this PR, an allocation and copy is avoided.

misc

This is part of a series of PRs to improve performance of the library (4/n). Previous: #3222

@eduar-hte
Copy link
Contributor Author

This change increases performance of the benchmark test by about 11%-12,5%.

These results were obtained running the benchmark test with 100'000 iterations with the OWASP CRS v4 and are presented as a reference:

  • Mac mini (1st generation)
    • v3/master: 96.15 secs
    • This PR: 84.11 secs (12.52% improvement)
  • Debian (bullseye) container running on WSL on Windows Dev Kit 2023
    • v3/master: 143,34 secs
    • This PR: 127,37 secs (11,14% improvement)

@eduar-hte
Copy link
Contributor Author

eduar-hte commented Aug 16, 2024

The unit tests for each transformation were executed (using a different number of iterations, as some take more time than others).

These results were obtained in a Debian (bullseye) container running on WSL on Windows Dev Kit 2023, and are provided as a reference.

(Times in milliseconds. Averages of five runs)

Transformation Iterations v3/master This PR Difference
base64decode 1500000 2,404.83 2,111.14 -12.21%
base64decodeExt 1250000 2,129.00 1,886.29 -11.40%
base64encode 2000000 2,205.14 1,890.67 -14.26%
cmdLine 2000000 2,586.29 2,290.14 -11.45%
compressWhitespace 800000 2,291.43 1,875.29 -18.16%
cssDecode 1600000 2,155.43 1,931.00 -10.41%
escapeSeqDecode 500000 1,899.67 1,704.14 -10.29%
hexDecode 1750000 2,505.14 2,213.50 -11.64%
hexEncode 600000 933.00 924.29 -0.93%
htmlEntityDecode 1000000 3066,41 2400,63 -21,71%
jsDecode 600000 2,249.14 2,009.57 -10.65%
lowercase 1750000 2,295.33 2,040.57 -11.10%
md5 1500000 3,321.80 3,270.86 -1.53%
normalisePath 250000 2,581.00 2,375.17 -7.97%
normalisePathWin 200000 2,217.86 2,004.67 -9.61%
parityEven7bit 1500000 2,259.00 2,003.00 -11.33%
parityOdd7bit 2000000 2,829.00 2,475.00 -12.51%
parityZero7bit 1750000 2,488.83 2,222.33 -10.71%
removeComments 300000 2,300.67 2,156.86 -6.25%
removeCommentsChar 300000 2,280.67 2,201.86 -3.46%
removeNulls 1000000 2,444.17 2,361.43 -3.39%
removeWhitespace 1250000 2,448.00 2,335.86 -4.58%
replaceComments 250000 2,007.57 1,913.57 -4.68%
replaceNulls 1250000 2,963.17 2,794.43 -5.69%
sha1 2000000 3,141.14 3,067.50 -2.34%
sqlHexDecode 8000000 3,004.86 2,777.60 -7.56%
trim 750000 2,637.43 2,498.00 -5.29%
trimLeft 750000 2,528.14 2,387.43 -5.57%
trimRight 750000 2,598.33 2,454.29 -5.54%
urlDecode 300000 2,657.33 2,448.00 -7.88%
urlDecodeUni 150000 2,850.83 2,535.80 -11.05%
urlEncode 250000 2,469.86 2,378.00 -3.72%
utf8toUnicode 30000 4,194.29 4,061.29 -3.17%

@eduar-hte eduar-hte force-pushed the remove-copies-transformations branch 15 times, most recently from 1ae67da to 6efb1cb Compare August 18, 2024 19:20
@eduar-hte eduar-hte force-pushed the remove-copies-transformations branch 4 times, most recently from 5f3aa66 to a801f41 Compare August 19, 2024 18:37
@eduar-hte
Copy link
Contributor Author

Regenerated PR to divide changes to make transformations in-place into multiple commits to simplify review. The original commit can be found in eduar-hte/ModSecurity@0a2dd1a.

NOTE: This required making the first commit in the PR include the changes to the signature of Transformation::evaluate and temporarily adjust each derived class for the code to still compile and work.

@eduar-hte eduar-hte force-pushed the remove-copies-transformations branch 4 times, most recently from de7a953 to ffcbf27 Compare August 20, 2024 01:41
@marcstern marcstern added the 3.x Related to ModSecurity version 3.x label Aug 20, 2024
@eduar-hte eduar-hte force-pushed the remove-copies-transformations branch from ffcbf27 to 7c30fcb Compare August 21, 2024 16:07
- Removed inplace helper function from the class, as it's only
  referenced by the implementation.
…ments

- utils::urldecode_nonstrict_inplace decodes inplace so key & value,
  which are values returned by utils::string::ssplit_pair can be
  just be modified and do not need to be copied.
- Updated signature of utils::urldecode_nonstrict_inplace, as its
  two callers already have std::string values.
- This function already expects these arguments not to be null pointers,
  doesn't validate them and just dereference them.
- In order to make this explicit and enforced by the compiler, they're
  now passed as references.
- Some of the Transformation classes would initialize their Action's
  action_kind using the default (using Transformation constructor
  without an action_kind parameter).
- Others, however, would use that constructor and initialize action_kind
  manually in their constructor, but setting the default value
  (RunTimeBeforeMatchAttemptKind = 1), which was redundant.
- Removed unused Transformation constructor to specify action_kind.
- Converted Action::Kind into an 'enum class' to require using the enum
  constants (instead of integer values, which are difficult to track in
  the codebase and change)
- Moved them into modsecurity::utils::string to avoid polluting the
  global namespace.
@eduar-hte eduar-hte force-pushed the remove-copies-transformations branch from b6eb3cb to a6d64bf Compare August 27, 2024 13:03
@eduar-hte
Copy link
Contributor Author

eduar-hte commented Aug 27, 2024

  • Replace this use of "push_back" with "emplace_back". (3)
  • Either add a parameter list or the "&" operator to this use of "mbedtls_md5". (2)

I've just pushed an update (integrating these changes into the corresponding existing commits) to address these.

NOTE: I'm updating the comment to reflect that Sonarcloud quickly updated its findings, but still reports issues in code that is no longer there (!). This happened to me a couple of times last week while I was in the process of addressing other issues. It seems that the scan is sometimes done in stale code, so it's not accurate. I had to wait for a while and then generate a new push for the scan to be accurate. If you could re-run the scan in an hour (or two), I think we should see the 5 issues above go away.

Copy link

@airween
Copy link
Member

airween commented Aug 28, 2024

After addressing many of the things reported by Sonarcloud, I think (almost) all the current ones exist in the current codebase, and are only flagged now because the PR updates some of the code.

Yes, I knew that - I just asked you about this to know what do you think in long term.

Thank you for all explanations - we discussed the Sonarcloud issues with @gberkes and we agreed that we have to review all issues (not just these ones) later, and fix them by category.

Anyway, these summaries are awesome here and everyone can see that these are not new issues. Thanks again, and thanks for the great PR.

I'm going to merge this now.

@airween airween merged commit 9148668 into owasp-modsecurity:v3/master Aug 28, 2024
49 checks passed
airween added a commit to airween/ModSecurity that referenced this pull request Aug 28, 2024
@eduar-hte eduar-hte deleted the remove-copies-transformations branch August 28, 2024 12:48
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Aug 28, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/secrules-language-tests that referenced this pull request Sep 4, 2024
@eduar-hte
Copy link
Contributor Author

Submitted PR owasp-modsecurity/secrules-language-tests#10 to add more tests for the sqlHexDecode transformation updated in this PR.

eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Sep 9, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Sep 9, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Sep 10, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Sep 10, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Oct 15, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Oct 19, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Oct 19, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
eduar-hte added a commit to eduar-hte/ModSecurity that referenced this pull request Oct 21, 2024
…stions

- The following two warnings were generated after introducing the change
  to instantiate the DigestImpl template with the address of mbedtls_md5
  or mbedtls_sha1:
  - warning: src/utils/sha1.h,62,error,danglingTemporaryLifetime,Using
    pointer that is a temporary.
  - warning: src/utils/sha1.h,60,style,constVariablePointer,Variable
    'ret' can be declared as pointer to const
- See owasp-modsecurity#3231 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x Related to ModSecurity version 3.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants