Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding the byte array in JSON with base64 #2542

Closed
WenyXu opened this issue Oct 8, 2023 · 4 comments · Fixed by #2545
Closed

Encoding the byte array in JSON with base64 #2542

WenyXu opened this issue Oct 8, 2023 · 4 comments · Fixed by #2545
Labels
C-enhancement Category Enhancements
Milestone

Comments

@WenyXu
Copy link
Member

WenyXu commented Oct 8, 2023

What type of enhancement is this?

Performance

What does the enhancement do?

Encoding byte arrays in JSON is inefficient. One acceptable solution is encoding the byte array with base64.

impl Serialize for CreateTableTask {
fn serialize<S>(&self, serializer: S) -> result::Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let table_info = serde_json::to_vec(&self.table_info)
.map_err(|err| serde::ser::Error::custom(err.to_string()))?;
let pb = PbCreateTableTask {
create_table: Some(self.create_table.clone()),
partitions: self.partitions.clone(),
table_info,
};
let buf = pb.encode_to_vec();
serializer.serialize_bytes(&buf)
}
}

impl Serialize for AlterTableTask {
fn serialize<S>(&self, serializer: S) -> result::Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let pb = PbAlterTableTask {
alter_table: Some(self.alter_table.clone()),
};
let buf = pb.encode_to_vec();
serializer.serialize_bytes(&buf)
}
}

Implementation challenges

No response

@WenyXu WenyXu added the C-enhancement Category Enhancements label Oct 8, 2023
@killme2008
Copy link
Contributor

killme2008 commented Oct 8, 2023

I think the performance of ddl procedure is not a major concern right now, and this serialization can't be the major bottleneck of it. I prefer the JSON format personally, it is more readable and friendly for debugging and management.

@WenyXu
Copy link
Member Author

WenyXu commented Oct 8, 2023

I think the performance of ddl procedure is not a major concern right now, and this serialization can't be the major bottleneck of it. I prefer the JSON format personally, it is more readable and friendly for debugging and management.

In this case, we only encode protobuf-encoded byte array with base64. By default, JSON format will treat the u8 array as the u32 array, which results in an expansion ratio of up to 4:1.

let pb = PbAlterTableTask { 
             alter_table: Some(self.alter_table.clone()), 
         }; 
         let buf = pb.encode_to_vec(); 
         serializer.serialize_bytes(&buf) 

@fengjiachun
Copy link
Collaborator

By the way, it seems that we haven't gained any benefits from using the JSON format in our current code processing. Formatting JSON from protobuf bytes doesn't enhance readability or user-friendliness.

@killme2008
Copy link
Contributor

Got the point. If you want to make this change, please do it ASAP before v0.4.0 is released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants