Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid bulk string terminator #4424

Closed
rauny-henrique opened this issue Jan 7, 2025 · 13 comments · Fixed by #4441
Closed

Invalid bulk string terminator #4424

rauny-henrique opened this issue Jan 7, 2025 · 13 comments · Fixed by #4441
Assignees
Labels
bug Something isn't working

Comments

@rauny-henrique
Copy link

Describe the bug

After upgrading our dragonfly instances from version v1.20.1 to version v1.25.5 we noticed the appearance of this exception (using the .net driver):
"StackExchange.Redis.RedisConnectionException: Invalid bulk string terminator"

We performed tests with versions v1.24.0 and v1.26.0 which also presented the same problem.

To Reproduce
Steps to reproduce the behavior:

  1. Just run the application. We use version 2.7.33 of the .net driver (https://github.com/StackExchange/StackExchange.Redis)

Expected behavior
We expect the application to run normally, as it did in version v1.20.1.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • OS: Oracle Linux Server 8.8
  • Kernel: 5.15.0-200.131.27.1.el8uek.x86_64
  • Containerized?: Kubernetes
  • Dragonfly Version: v1.25.5

Additional context

StackExchange.Redis.RedisConnectionException: Invalid bulk string terminator
at StackExchange.Redis.PhysicalConnection.ReadBulkString(ResultType type, ResultFlags flags, BufferReader& reader, Boolean includeDetailInExceptions, ServerEndPoint server) in /_/src/StackExchange.Redis/PhysicalConnection.cs:line 2000
at StackExchange.Redis.PhysicalConnection.TryParseResult(ResultFlags flags, Arena`1 arena, ReadOnlySequence`1& buffer, BufferReader& reader, Boolean includeDetilInExceptions, ServerEndPoint server, Boolean allowInlineProtocol) in /_/src/StackExchange.Redis/PhysicalConnection.cs:line 2078
at StackExchange.Redis.PhysicalConnection.ReadArray(ResultType resultType, ResultFlags flags, Arena`1 arena, ReadOnlySequence`1& buffer, BufferReader& reader, Boolean includeDetailInExceptions, ServerEndPoint server) in /_/src/StackExchange.Redis/PhysicalConnection.cs:line 1951
at StackExchange.Redis.PhysicalConnection.TryParseResult(ResultFlags flags, Arena`1 arena, ReadOnlySequence`1& buffer, BufferReader& reader, Boolean includeDetilInExceptions, ServerEndPoint server, Boolean allowInlineProtocol) in /_/src/StackExchange.Redis/PhysicalConnection.cs:line 2081
at StackExchange.Redis.PhysicalConnection.ProcessBuffer(ReadOnlySequence`1& buffer) in /_/src/StackExchange.Redis/PhysicalConnection.cs:line 1857
@rauny-henrique rauny-henrique added the bug Something isn't working label Jan 7, 2025
@romange
Copy link
Collaborator

romange commented Jan 7, 2025

thank you for reporting the problem.

To fix it we need to know exactly the response that causes the problem.

luckily I found this helper tool in .Net client library
https://github.com/StackExchange/StackExchange.Redis/blob/main/docs/RespLogging.md#logging-resp-streams

will it be possible to record the response stream so we could reproduce the problem locally?

@MaksTenne
Copy link

MaksTenne commented Jan 8, 2025

Basic Information:

  • Dragonfly v1.26.0 Docker on AWS.
  • .NET StackExchange.Redis, version 2.8.24

We've encountered an consistent error with LRANGE using ListRangeAsync in StackExchange.Redis during the initial implementation phase. List retrieval alternates between success and failure (e.g., first attempt succeeds, second fails) for a list of 1075 objects. We're investigating further and will provide more details as they become available.

@1ach
Copy link

1ach commented Jan 9, 2025

We also experienced this issue last year when we upgraded to Dragonfly v1.22.2.

Upgrading StackExchange.Redis version had no impact.

Only solution we found at the time was to rollback to v1.21.4 which seemed to be the last version to not experience the issue.

For us the error was inconsistent - we didn't see it in testing but appeared sporadically during a production rollout. Even then the error did not appear always even repeating the same application steps with the same dataset, so it has been difficult to capture.

The error was most likely to appear when we used ZRANGEBYSCORE (via StackExchange.Redis SortedSetRangeByScoreAsync method) on a Sorted Set with a larger limit (200+) of JSON encoded value objects.

If we are able to capture more details, I'll add them here.

@romange
Copy link
Collaborator

romange commented Jan 9, 2025

We switched to a new code that renders RESP replies in more efficient way. I am confident the bug is there. With upto 1.25.6 it is possible to workaround the issue using --experimental_new_io=false but starting from 1.26.0 we removed the old code, because we did not know of any issues related to it :(
I wish I knew how we can reproduce it. @MaksTenne if you can provide us with a reproducible example, this would really help!

@MaksTenne
Copy link

Hey @romange,

I've created a reproducible public repository to address the "Invalid bulk string terminator" issue encountered in the "LRANGE" ListRangeAsync implementation of StackExchange.Redis. You can find the repo here: Invalid-bulk-string-terminator.

If you need any additional information or assistance to help fix the problem, please feel free to reach out!

@romange
Copy link
Collaborator

romange commented Jan 10, 2025

@MaksTenne thanks! Currently I am stuck at causing the webapp to run correctly.
This is what I do:

  1. docker build . -t app.net - created the container
  2. docker run --network=host -e ASPNETCORE_ENVIRONMENT=Development app.net, it prints:
info: Microsoft.Hosting.Lifetime[14]
    Now listening on: http://[::]:8080
info: Microsoft.Hosting.Lifetime[0]
    Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
    Hosting environment: Development
info: Microsoft.Hosting.Lifetime[0]
    Content root path: /app

I at this point I know that the app is running because if I do not run dragonfly on 6379, it exits with an error. So far so good.
but then whenever I run any curl command, it results in 400.

curl -v      http://[::]:8080/app/Redis/data
curl -v     -H "Content-Type: application/json" -d '{"someData": "value"}'  http://[::]:8080/app/Redis/generate

or any other url - results in error 400:

*   Trying :::8080...
* Connected to :: (::1) port 8080 (#0)
> POST /app/Redis/generate HTTP/1.1
> Host: [::]:8080
> User-Agent: curl/7.81.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 21
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Length: 0
< Connection: close
< Date: Fri, 10 Jan 2025 06:53:36 GMT
< Server: Kestrel
< 
* Closing connection 0

@MaksTenne
Copy link

@romange

Instead of using --network=host, I use -p 8080:8080 because --network doesn't work for me.

URLs:

  • curl -v http://localhost:8080/api/Redis/generate
  • curl -v http://localhost:8080/api/Redis/data

I accidentally configured the Generate method as HttpPost instead of HttpGet.

Pull the repo again and it should be changed.

@romange
Copy link
Collaborator

romange commented Jan 10, 2025

thanks, I succeeded to run both handlers and both succeed for me. http://localhost:8080/api/Redis/data returns a big json object. do you see something else @MaksTenne ? how the failure looks like?

@romange
Copy link
Collaborator

romange commented Jan 10, 2025

Ah, I see - the second request fails.

romange added a commit that referenced this issue Jan 11, 2025
…strings

Due to a corner-case bug, reply builder could add \0\0 to the end of bulk strings, instead of
\r\n. The bug slipped our tests because redis-py parser most probably does not validate the ending
as long as everything else is consistent.

This PR:
1. Adds a test that catches the bug
2. Adds a debug check that verifies the destination pointer is consistent with the iovec being used.
3. Fixes the bug.

Fixes #4424
romange added a commit that referenced this issue Jan 11, 2025
…dings.

Due to a corner-case bug, reply builder could add \0\0 to the end of bulk strings, instead of
\r\n. The bug slipped our tests because redis-py parser most probably does not validate the ending
as long as everything else is consistent.

This PR:
1. Adds a test that catches the bug
2. Adds a debug check that verifies the destination pointer is consistent with the iovec being used.
3. Fixes the bug.

Fixes #4424
romange added a commit that referenced this issue Jan 11, 2025
Due to a corner-case bug, reply builder could add \0\0 to the end of bulk strings, instead of
\r\n. The bug slipped our tests because redis-py parser most probably does not validate the ending
as long as everything else is consistent.

This PR:
1. Adds a test that catches the bug
2. Adds a debug check that verifies the destination pointer is consistent with the iovec being used.
3. Fixes the bug.

Fixes #4424
@romange
Copy link
Collaborator

romange commented Jan 11, 2025

It was a bit tricky to catch by using our python tests as as the redis-py client does not validate the endings of bulk strings.
At the end I succeeded in creating a c++ unit test that covers the bug. We will release a patch release next week.

Thank you to everyone and especially @MaksTenne for helping to identify the issue 🙏🏼

@romange romange self-assigned this Jan 11, 2025
romange added a commit that referenced this issue Jan 12, 2025
…4441)

Due to a corner-case bug, reply builder could add \0\0 to the end of bulk strings, instead of
\r\n. The bug slipped our tests because redis-py parser most probably does not validate the ending
as long as everything else is consistent.

This PR:
1. Adds a test that catches the bug
2. Adds a debug check that verifies the destination pointer is consistent with the iovec being used.
3. Fixes the bug.

Fixes #4424
@MaksTenne
Copy link

Thank you for your quick response and solution! It was fun to assist you :).

@romange
Copy link
Collaborator

romange commented Jan 13, 2025

The bug is fixed in v1.26.1

@rauny-henrique
Copy link
Author

Thanks for the support and help @romange and @MaksTenne. I was unable to provide feedback on the response stream as I was waiting for my company's development team to become available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants