Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Changed handling of large requests to transfer them as leaked pointers #1655

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

barshaul
Copy link
Collaborator

@barshaul barshaul commented Jun 25, 2024

Large protobuf messages are extremely slow to decode/encode, and with large messages (e.g., 512MB), the client becomes unresponsive. We resolve this issue by passing the arguments as a leaked pointer instead of a bytes vector when the message size exceeds the limit.

When we merge in the bytes/string support, we should change these functions to the relevant types. Added TODOs in the code for that matter.

@barshaul barshaul force-pushed the use_args_pointer_py branch from 9e9eb9f to 0f8224b Compare June 26, 2024 09:12
@barshaul barshaul marked this pull request as ready for review June 26, 2024 09:17
@barshaul barshaul requested a review from a team as a code owner June 26, 2024 09:17
@barshaul barshaul requested a review from avifenesh June 26, 2024 09:18
"""

# TODO: Allow passing different encoding options
return bytes(arg, encoding="utf8")
Copy link
Collaborator

@avifenesh avifenesh Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not completely platform agnostic - we need to add handling in utf8 usage.
For example AIX and Solaris are unix based OS that don't use utf8 in their default. It is possible to configure new versions to use utf8, but the old versions don't even have the option.
If at some point we'll want to support windows, it can be even more problematic.
It might not be relevant enough at that point, but we need more than allowing the passing of different encoding. To avoid crashing or producing incorrect results, we need to check the OS we are running on when no specific encoding is passed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM we will support only utf8 strings and bytes as passed arguments and return values. Later on we will support encoder as a client configuration which will allow more encoding options. If the user doesn't use utf8 strings, he can use bytes instead


from glide.constants import TResult

DEFAULT_TIMEOUT_IN_MILLISECONDS: int = ...
MAX_REQUEST_ARGS_LEN: int = ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain somewhere what is that size, and how is it possible that this size come from rust but is relevant for all the languages. Something here is a bit confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is defined and documented in socket_listener.rs, i'm only importing it here and exporting it to the python wrapper

@avifenesh
Copy link
Collaborator

It is not clear enough what are you doing here.
When you say leaked pointers i think about allocated memory that we lost the pointer to, and it is not clear to me how using it solve the issue.
It might be something simple that other will understand more easily but at least for me it wasn't, so i would like to see some doc with deeper explanation of what exactly is happening here.

@barshaul
Copy link
Collaborator Author

barshaul commented Jun 26, 2024

It is not clear enough what are you doing here. When you say leaked pointers i think about allocated memory that we lost the pointer to, and it is not clear to me how using it solve the issue. It might be something simple that other will understand more easily but at least for me it wasn't, so i would like to see some doc with deeper explanation of what exactly is happening here.

We discussed in the office - this is a part of the client's design. Since large messages aren't being properly handled with protobuf, we are using an FFI call with the args vec, allocating it on the heap on the rust side and leaking it so it won't be deleted by the GC. Then we pass only the pointer on the protobuf message, sending the protobuf message through the socket, and getting back the leaked memory from the received pointer in the core side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants