Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate unnecessary escape characters #4176

Merged
merged 1 commit into from
Aug 11, 2024

Conversation

ruudk
Copy link
Contributor

@ruudk ruudk commented Aug 7, 2024

This is a first attempt at solving #4123 and #2712.

Currently it writes the deprecations to an array in the Lexer. This is probably not the way to do it. Should we directly trigger E_USER_DEPRECATED errors?

/cc @stof @fabpot Let me know what you think 😊

@stof
Copy link
Member

stof commented Aug 7, 2024

To me, this should directly trigger E_USER_DEPRECATED (or use the trigger_deprecation of the symfony/deprecation-contracts if we already use it as the way to trigger deprecations).

And this should go in the 3.x branch (so that 4.0 can ship with the new behavior instead of having to wait until 5.0)

tests/LexerTest.php Outdated Show resolved Hide resolved
tests/LexerTest.php Outdated Show resolved Hide resolved
@ruudk
Copy link
Contributor Author

ruudk commented Aug 7, 2024

Thanks for the review. I think it makes sense. I will change this PR to target v3 and make sure we only deprecate and not introduce new functionality.

Then for v4 we can do the other change.

@ruudk ruudk changed the base branch from 4.x to 3.x August 7, 2024 17:37
@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch 3 times, most recently from 707e826 to 7bbbe6f Compare August 7, 2024 17:43
@ruudk
Copy link
Contributor Author

ruudk commented Aug 7, 2024

Oke, I updated the PR to target V3. We don't change the behavior. We only deprecate, using trigger_deprecation, when we detect an unnecessary escape character.

When this is merged, v4 is rebased with v3, I can provide the PR that changes the behavior there.

Please have another look when you have time 😊

@ruudk ruudk requested a review from stof August 7, 2024 17:47
Copy link
Contributor

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a note in the docs explaining when one needs to escape.

src/Lexer.php Outdated Show resolved Hide resolved
src/Lexer.php Outdated Show resolved Hide resolved
@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch 4 times, most recently from f385f58 to 5b1c8b1 Compare August 7, 2024 19:18
@ruudk
Copy link
Contributor Author

ruudk commented Aug 7, 2024

I refactored the code a bit and added more test cases. I think good to carefully review these test cases. Let me know if you think I can add more cases.

Also updated the docs.

@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch from 5b1c8b1 to c062f96 Compare August 7, 2024 19:23
src/Lexer.php Outdated Show resolved Hide resolved
tests/LexerTest.php Outdated Show resolved Hide resolved
tests/LexerTest.php Outdated Show resolved Hide resolved
@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch from c062f96 to 18bc4f2 Compare August 8, 2024 07:32
src/Lexer.php Outdated Show resolved Hide resolved
@ruudk ruudk requested a review from fabpot August 8, 2024 07:33
@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch 2 times, most recently from 0954f23 to 91f0010 Compare August 8, 2024 12:00
@ruudk
Copy link
Contributor Author

ruudk commented Aug 8, 2024

Alright, I think I'm done. I added more documentation related to Octal and Hexadecimal escape sequences.

@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch from 91f0010 to 63b3230 Compare August 8, 2024 12:09
@ruudk
Copy link
Contributor Author

ruudk commented Aug 10, 2024

@fabpot what do you think of this? Anything we should change or add more tests?

doc/templates.rst Show resolved Hide resolved
src/Lexer.php Outdated Show resolved Hide resolved
src/Lexer.php Outdated Show resolved Hide resolved
@fabpot
Copy link
Contributor

fabpot commented Aug 10, 2024

I've just run a small performance test via the script below and (maybe as expected) the hit can be significant:

<?php

require_once __DIR__.'/vendor/autoload.php';

$string = str_repeat("日本では、春になると桜の花が咲きます。多くの人々は、公園や川の近くに集まり、お花見を楽しみます。桜の花びらが風に舞い、まるで雪のように見える瞬間は、とても美しいです。", 10);
$loader = new Twig\Loader\ArrayLoader([
    'index.twig' => <<<EOF
{{ "$string" }}
EOF
    ,
]);

$twig = new Twig\Environment($loader);

for ($i = 0; $i < 10000; $i++) {
    $twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}

On my machine, ~100ms for the 3.x branch vs ~900ms for this branch.

But I suppose that this kind of long string is not very common. I would expect "hardcoded" strings to be quite short in templates.

@ruudk
Copy link
Contributor Author

ruudk commented Aug 10, 2024

I ran the bench on my machine:

// v3 branch
Benchmark 1: php bench.php
  Time (mean ± σ):     110.2 ms ±   3.7 ms    [User: 98.5 ms, System: 7.4 ms]
  Range (min … max):   108.6 ms … 126.9 ms    23 runs

// this branch
Benchmark 1: php bench.php
  Time (mean ± σ):      1.018 s ±  0.010 s    [User: 0.991 s, System: 0.009 s]
  Range (min … max):    1.007 s …  1.032 s    10 runs

When I change the stripcslashes method:

 private function stripcslashes(string $str, string $quoteType): string
 {
+    if (!str_contains($str, '\\')) {
+        return $str;
+    }
+    // ...

it becomes this:

// this branch
Benchmark 1: php bench.php
  Time (mean ± σ):      95.7 ms ±   1.1 ms    [User: 84.9 ms, System: 7.5 ms]
  Range (min … max):    94.5 ms …  99.0 ms    29 runs

But obviously, that's because the bench does not contain any escape characters.

So let's change the bench to:

<?php

require_once __DIR__.'/vendor/autoload.php';

$string = str_repeat("日本では、春になると桜の花が咲きます。\n多くの人々は、公園や川の近くに集まり、お花見を楽しみます。\\n桜の花びらが風に舞い、まるで雪のように見える瞬間は、とても美しいです。", 10);
$loader = new Twig\Loader\ArrayLoader([
    'index.twig' => <<<EOF
{{ "$string" }}
EOF
    ,
]);

$twig = new Twig\Environment($loader);

for ($i = 0; $i < 10000; $i++) {
    $twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}

Now we get

// this branch
Benchmark 1: php bench.php
  Time (mean ± σ):      1.026 s ±  0.002 s    [User: 0.999 s, System: 0.008 s]
  Range (min … max):    1.024 s …  1.029 s    10 runs

So the str_contains check is making it a bit slower when there is a backslash found. But when there is not, it can return the raw string directly, making it even faster than v3.

@ruudk ruudk requested a review from fabpot August 10, 2024 18:19
Copy link
Contributor

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't added the str_contains() call yet, right?

src/Lexer.php Outdated Show resolved Hide resolved
src/Lexer.php Show resolved Hide resolved
src/Lexer.php Show resolved Hide resolved
src/Lexer.php Outdated Show resolved Hide resolved
@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch 2 times, most recently from fc6704a to a4b9cae Compare August 11, 2024 14:20
See twigphp#4123 twigphp#2712

This allows to change the implementation in v4 so that we no longer have to escape backslashes.
@ruudk ruudk force-pushed the deprecate-unnecessary-escaping branch from a4b9cae to c6656cf Compare August 11, 2024 14:21
@ruudk
Copy link
Contributor Author

ruudk commented Aug 11, 2024

@fabpot Applied the feedback and also added the str_contains optimization.

@ruudk ruudk requested a review from fabpot August 11, 2024 14:23
@fabpot
Copy link
Contributor

fabpot commented Aug 11, 2024

Thank you @ruudk.

@fabpot fabpot merged commit 110c1ab into twigphp:3.x Aug 11, 2024
49 checks passed
@ruudk ruudk deleted the deprecate-unnecessary-escaping branch August 11, 2024 16:17
@ruudk
Copy link
Contributor Author

ruudk commented Aug 11, 2024

Thanks! I will prep the next PR when v4 is synced.

@smnandre
Copy link
Contributor

Hi @ruudk ... i may have a version that reduces the runtime difference, using chunks to avoid looping on every char.

It passes the tests, but i'm not sure if this goes in the direction you wanted for it.

So if you want to check / benchmark it:

private function stripcslashes(string $str, string $quoteType): string
{
    if (!str_contains($str, '\\')) {
        return $str;
    }

    $result = '';
    $length = strlen($str);
    $specialChars = [
        'f' => "\f",
        'n' => "\n",
        'r' => "\r",
        't' => "\t",
        'v' => "\v",
    ];

    $i = 0;
    while ($i < $length) {
        if (false === $pos = strpos($str, '\\', $i)) {
            $result .= substr($str, $i);
            break;
        }

        $result .= substr($str, $i, $pos - $i);
        $i = $pos + 1;

        if ($i >= $length) {
            $result .= '\\';
            break;
        }

        $nextChar = $str[$i];

        if (isset($specialChars[$nextChar])) {
            $result .= $specialChars[$nextChar];
        } elseif ($nextChar === '\\' || $nextChar === $quoteType) {
            $result .= $nextChar;
        } elseif ($nextChar === '#' && $i + 1 < $length && $str[$i + 1] === '{') {
            $result .= '#{';
            $i++;
        } elseif ($nextChar === 'x' && $i + 1 < $length && ctype_xdigit($str[$i + 1])) {
            $hex = $str[++$i];
            if ($i + 1 < $length && ctype_xdigit($str[$i + 1])) {
                $hex .= $str[++$i];
            }
            $result .= chr(hexdec($hex));
        } elseif (ctype_digit($nextChar) && $nextChar < '8') {
            $octal = $nextChar;
            while ($i + 1 < $length && ctype_digit($str[$i + 1]) && $str[$i + 1] < '8' && strlen($octal) < 3) {
                $octal .= $str[++$i];
            }
            $result .= chr(octdec($octal));
        } else {
            trigger_deprecation('twig/twig', '3.12', sprintf('Character "%s" at position %d does not need to be escaped anymore.', $nextChar, $i + 1));
            $result .= $nextChar;
        }

        $i++;
    }

    return $result;
}

@ruudk
Copy link
Contributor Author

ruudk commented Aug 12, 2024

@smnandre smart! Will check it out today

ruudk added a commit to ruudk/Twig that referenced this pull request Aug 12, 2024
See twigphp#4176 (comment)

Co-Authored-By: Simon André <[email protected]>

```
# Old
Benchmark 1: php bench.php
  Time (mean ± σ):      1.192 s ±  0.004 s    [User: 1.163 s, System: 0.009 s]
  Range (min … max):    1.186 s …  1.198 s    10 runs

# New
Benchmark 1: php bench.php
  Time (mean ± σ):     356.5 ms ±   1.1 ms    [User: 342.0 ms, System: 7.6 ms]
  Range (min … max):   354.6 ms … 358.2 ms    10 runs
```

Using bench.php:
```
<?php

require_once __DIR__.'/vendor/autoload.php';

$string = str_repeat("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut a tincidunt turpis. \\App\\Entity\\Product \World \065 App\#{var} \'quoted\' Donec pharetra enim quis erat pharetra, dignissim molestie erat laoreet. Vivamus auctor purus sed lorem vestibulum rhoncus.", 10);
$loader = new Twig\Loader\ArrayLoader([
    'index.twig' => <<<EOF
{{ "$string" }}
EOF
    ,
]);

$twig = new Twig\Environment($loader);

for ($i = 0; $i < 10000; $i++) {
    $twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}

```
@ruudk ruudk mentioned this pull request Aug 12, 2024
ruudk added a commit to ruudk/Twig that referenced this pull request Aug 12, 2024
See twigphp#4176 (comment)

Co-Authored-By: Simon André <[email protected]>

```
# Old
Benchmark 1: php bench.php
  Time (mean ± σ):      1.192 s ±  0.004 s    [User: 1.163 s, System: 0.009 s]
  Range (min … max):    1.186 s …  1.198 s    10 runs

# New
Benchmark 1: php bench.php
  Time (mean ± σ):     356.5 ms ±   1.1 ms    [User: 342.0 ms, System: 7.6 ms]
  Range (min … max):   354.6 ms … 358.2 ms    10 runs
```

Using bench.php:
```
<?php

require_once __DIR__.'/vendor/autoload.php';

$string = str_repeat("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut a tincidunt turpis. \\App\\Entity\\Product \World \065 App\#{var} \'quoted\' Donec pharetra enim quis erat pharetra, dignissim molestie erat laoreet. Vivamus auctor purus sed lorem vestibulum rhoncus.", 10);
$loader = new Twig\Loader\ArrayLoader([
    'index.twig' => <<<EOF
{{ "$string" }}
EOF
    ,
]);

$twig = new Twig\Environment($loader);

for ($i = 0; $i < 10000; $i++) {
    $twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}

```
@ruudk
Copy link
Contributor Author

ruudk commented Aug 12, 2024

Created the PR to optimize this further: #4196

ruudk added a commit to ruudk/Twig that referenced this pull request Aug 12, 2024
See twigphp#4176 (comment)

Co-Authored-By: Simon André <[email protected]>

```
# Old
Benchmark 1: php bench.php
  Time (mean ± σ):      1.204 s ±  0.012 s    [User: 1.185 s, System: 0.011 s]
  Range (min … max):    1.191 s …  1.225 s    10 runs

# New
Benchmark 1: php bench.php
  Time (mean ± σ):     368.3 ms ±   4.8 ms    [User: 354.2 ms, System: 10.1 ms]
  Range (min … max):   362.9 ms … 375.6 ms    10 runs
```

Using bench.php:
```
<?php

require_once __DIR__.'/vendor/autoload.php';

$string = str_repeat("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut a tincidunt turpis. \\App\\Entity\\Product \World \065 App\#{var} \'quoted\' Donec pharetra enim quis erat pharetra, dignissim molestie erat laoreet. Vivamus auctor purus sed lorem vestibulum rhoncus.", 10);
$loader = new Twig\Loader\ArrayLoader([
    'index.twig' => <<<EOF
{{ "$string" }}
EOF
    ,
]);

$twig = new Twig\Environment($loader);

for ($i = 0; $i < 10000; $i++) {
    $twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}

```
fabpot added a commit that referenced this pull request Aug 12, 2024
This PR was squashed before being merged into the 3.x branch.

Discussion
----------

Optimize stripcslashes

See #4176 (comment) by `@smnandre`

```
# Old
Benchmark 1: php bench.php
  Time (mean ± σ):      1.192 s ±  0.004 s    [User: 1.163 s, System: 0.009 s]
  Range (min … max):    1.186 s …  1.198 s    10 runs

# New
Benchmark 1: php bench.php
  Time (mean ± σ):     356.5 ms ±   1.1 ms    [User: 342.0 ms, System: 7.6 ms]
  Range (min … max):   354.6 ms … 358.2 ms    10 runs
```

Using bench.php:
```php
<?php

require_once __DIR__.'/vendor/autoload.php';

$string = str_repeat("Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut a tincidunt turpis. \\App\\Entity\\Product \World \065 App\#{var} \'quoted\' Donec pharetra enim quis erat pharetra, dignissim molestie erat laoreet. Vivamus auctor purus sed lorem vestibulum rhoncus.", 10);
$loader = new Twig\Loader\ArrayLoader([
    'index.twig' => <<<EOF
{{ "$string" }}
EOF
    ,
]);

$twig = new Twig\Environment($loader);

for ($i = 0; $i < 10000; $i++) {
    $twig->tokenize(new Twig\Source($twig->getLoader()->getSourceContext('index.twig')->getCode(), 'index.twig'));
}

```

Commits
-------

bc6e36d Optimize stripcslashes
nicolas-grekas added a commit to symfony/symfony that referenced this pull request Aug 12, 2024
This PR was merged into the 5.4 branch.

Discussion
----------

[TwigBridge] fix tests using Twig 3.12

| Q             | A
| ------------- | ---
| Branch?       | 5.4
| Bug fix?      | no
| New feature?  | no
| Deprecations? | no
| Issues        |
| License       | MIT

related to twigphp/Twig#4176

Commits
-------

a9858c6 fix tests using Twig 3.12
ruudk added a commit to ruudk/Twig that referenced this pull request Aug 12, 2024
Follow up of twigphp#4176, twigphp#4123 and twigphp#2712.

Now that we have deprecated unnecessary escape in v3, we can change the behavior in v4.

With this change, unnecessary will no longer be ignored, but handled like any other character.

This allows for writing fully qualified class names like this:
```twig
{{ constant('App\Entity\User::SOME_CONSTANT') }}
```

Instead of having to escape the `\` in v3:
```twig
{{ constant('App\\Entity\\User::SOME_CONSTANT') }}
```
ruudk added a commit to ruudk/Twig that referenced this pull request Aug 15, 2024
Follow up of twigphp#4176, twigphp#4123 and twigphp#2712.

Now that we have deprecated unnecessary escape in v3, we can change the behavior in v4.

With this change, unnecessary will no longer be ignored, but handled like any other character.

This allows for writing fully qualified class names like this:
```twig
{{ constant('App\Entity\User::SOME_CONSTANT') }}
```

Instead of having to escape the `\` in v3:
```twig
{{ constant('App\\Entity\\User::SOME_CONSTANT') }}
```
ruudk added a commit to ruudk/Twig that referenced this pull request Aug 21, 2024
Follow up of twigphp#4176, twigphp#4123 and twigphp#2712.

Now that we have deprecated unnecessary escape in v3, we can change the behavior in v4.

With this change, unnecessary will no longer be ignored, but handled like any other character.

This allows for writing fully qualified class names like this:
```twig
{{ constant('App\Entity\User::SOME_CONSTANT') }}
```

Instead of having to escape the `\` in v3:
```twig
{{ constant('App\\Entity\\User::SOME_CONSTANT') }}
```
fabpot added a commit that referenced this pull request Aug 21, 2024
This PR was squashed before being merged into the 4.x branch.

Discussion
----------

Do not hide unnecessary escape characters

Follow up of #4176, #4123 and #2712.

Now that we have deprecated unnecessary escape in v3, we can change the behavior in v4.

With this change, unnecessary will no longer be ignored, but handled like any other character.

This allows for writing fully qualified class names like this:
```twig
{{ constant('App\Entity\User::SOME_CONSTANT') }}
```

Instead of having to escape the `\` in v3:
```twig
{{ constant('App\\Entity\\User::SOME_CONSTANT') }}
```

/cc `@fabpot` `@stof`

Commits
-------

53c24bf Do not hide unnecessary escape characters
fabpot added a commit that referenced this pull request Sep 26, 2024
This PR was squashed before being merged into the 3.x branch.

Discussion
----------

Improve escape deprecation message

This was introduced in #4176

According to #4343 the message can be improved.

I made a few changes to the test to make it easier to work with. Best to review this PR **commit by commit**.

/cc `@stof`

Commits
-------

7256713 Improve escape deprecation message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants