Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: refactor and enhance the write batch extractor #2594

Open
2 tasks done
git-hulk opened this issue Oct 13, 2024 · 11 comments
Open
2 tasks done

proposal: refactor and enhance the write batch extractor #2594

git-hulk opened this issue Oct 13, 2024 · 11 comments
Labels
enhancement type enhancement

Comments

@git-hulk
Copy link
Member

git-hulk commented Oct 13, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

Motivation

Currently, the write batch extractor plays a key role in cluster migration and sync tool, and we would also like to introduce it to our change stream mechanism.

But it now might have those (potential)issues:

  • missing enough test cases to
  • unable to extend due to it bindings to the RESP format
  • hard to maintenance due to implement all data types inside one function

This was also mentioned in issue #2583. To improve this situation, I would like to propose a new design for the write batch extractor.

Solution

Introduce an intermediate representation format for the parsed format like ChangeStreamEvent and it will contain the following fields:

struct ChangeStreamEvent {
        int16_t event_type;
	int16_t event;
	int16_t data_type;	
	std::string key;
        std::varint<std::string, double, std::vector<std::string>> payload;
};
  • event_type: required, the type of this event and it will affect how we interpret the payload. Can be one of VALUE | COMMADNS
  • event: required, it should be ADD|DELETE|SET if the event type is VALUE
  • data_type: required, should be one of [STRING|HASH|SET|ZSET|LIST|STREAM|BITMAP|JSON]
  • key: required, the key which was affected
  • payload: arguments for the event

For example, it will emit an event after receiving the command HSET my_hash f0 v0} :

{       
        .event_type = "VALUE"
	.event = "SET"
	.data_type = "HASH"
	.key = "my_hash"
	.payload = ["f0", "v0"]
}

Then, the cluster migration/sync tool/poll updates can generate their own format based on the change stream event.

TODOs

  • Add support of WriteBatchExtractor to allow to transform into ChangeStreamEvent
  • Add support RESP as the default output format for ChangeStreamEvent
  • Replace the WriteBatchExtractor output with the ChangeStreamEvent
  • Add test cases for all data types.

What do you think? please don’t hesitate to leave comments if you have any thoughts or suggestions.

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@git-hulk git-hulk added the enhancement type enhancement label Oct 13, 2024
@PragmaTwice
Copy link
Member

PragmaTwice commented Oct 13, 2024

It seems related to the logical logging cc @mapleFU .

Could you elabrate the detailed design of the "event" field?

How to define a new event? e.g. for ZSET data type, could you list all event, for every write commands here?

Also the "subkey" is related to different encoding for each data type. If we store "subkey" here, it's too "low level" to be a recorder for structured semantics. Also could you list all "subkey"s for all ZSET commands?

@PragmaTwice
Copy link
Member

PragmaTwice commented Oct 13, 2024

IMHO we should have a quite clear mind about the layered (or, leveled) semantics of execution of Kvrocks:

  • High level: Redis commands
  • Middle level: An orthogonal, but structured operations.
  • Low level: RocksDB key-value read/writes

Today we talk about the middle level operations.

This middle layer needs to have a clear design to meet three goals:

  • Orthogonal: Redis commands are obviously not orthogonal, but the middle layer must be orthogonal and can be "lower" from each Redis command to the middle layer.
  • Structured: The middle layer operators must have clear semantics, retaining high-level data structure information, rather than fragmenting into rocksdb reads and writes.
  • Integrated: The middle layer must have complete context information to avoid irreproducibility due to execution order changes.

As you can see, this is a complex task. I hope we can have a clearer design: for example, for each redis data type, what is the detailed design of its middle-layer operator?

Then, we can talk about implementations.

@mapleFU
Copy link
Member

mapleFU commented Oct 13, 2024

By the way, would it better to making rdb have the same structure? Or they're totally different?

@PragmaTwice
Copy link
Member

PragmaTwice commented Oct 13, 2024

By the way, would it better to making rdb have the same structure? Or they're totally different?

rdb is for serialization of data, not for serialization of execution. so I don't think it's so suitable.

@git-hulk
Copy link
Member Author

How to define a new event? e.g. for ZSET data type, could you list all event, for every write commands here?

It generally contains: ADD|DELETE|SET. For example, the ZPOP command should generate a DELETE event.

If we store "subkey" here, it's too "low level" to be a recorder for structured semantics. Also could you list all "subkey"s for all ZSET commands?

What I mean by subkey here depends on the data type. For the Hash type, which we can see in the example, its subkey is the field f0. And for the ZSet type, its subkey should be its member.

I don't quite understand the meaning of low level here. Could you elaborate a bit about what you think?

Structured: The middle layer operators must have clear semantics, retaining high-level data structure information, rather than fragmenting into rocksdb reads and writes.

For the change stream event is now mapping to what's command changed instead of rocksdb read/writes. So that's why we have key and subkey.

@git-hulk
Copy link
Member Author

git-hulk commented Oct 13, 2024

As you can see, this is a complex task. I hope we can have a clearer design: for example, for each redis data type, what is the detailed design of its middle-layer operator?

Yes, I agree we should take care of the design before starting the implementation. We can reach a consensus about the high-level proposal, and then go into the details if all of us feel good about the proposal.

@PragmaTwice
Copy link
Member

PragmaTwice commented Oct 13, 2024

It generally contains: ADD|DELETE|SET. For example, the ZPOP command should generate a DELETE event.

How about commands like ZINTERSTORE, ZMPOP, ZPOPMIN? I'm still not sure if the current design is general enough. From my view it can easily become hard to model some complex scenarios.

e.g. for ZADD, the value should be member and score, but here you use just one std::string value, how to encode? and we need to decode to retrieve the score. IMHO it sounds a weird design.

Also I think it's better to generalize it to logical logging instead of just for migrate/sync.

@git-hulk git-hulk reopened this Oct 13, 2024
@git-hulk
Copy link
Member Author

@PragmaTwice I have changed to value to the payload which is a union type.

@PragmaTwice
Copy link
Member

PragmaTwice commented Oct 13, 2024

My current point is, if we cannot use it to model all operations, e.g. ZREMRANGE..., we'd better to keep the current (redis commands) form to make the impl unified and let user easily to understand.

But we can continuely focus and figure out a better design.

@git-hulk
Copy link
Member Author

Got your point. I agree that using the commands would make it easy to have a unified way to implement the change logs. My concern is that need to rewrite many commands like HINCR/HINCRY to HSET to simplify the downstream implementation.

@Genuineh
Copy link

I am not very familiar with the internal structure of kvrocks, nor am I very familiar with C++. However, I personally think that it is only necessary to connect to rocksdb itself. As for the need for RESP or which data structure in https://kvrocks.apache.org/community/data-structure-on-rocksdb needs to be connected may be related to the user's needs (simply put, when considering performance, one will be more inclined to directly connect to the data-structure-on-rocksdb or kv of rocksdb). Only the parsing method of business-class data structures needs to be provided for users to use. In addition, maybe kvrocks will have its own business protocol and data structures that redis does not have in the future. Directly connecting to the structure on rocksdb can avoid this risk. And even if one only wants to use it as redis, if this structure is fully compatible with redis, it can be freely converted through the RESP parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement type enhancement
Projects
None yet
Development

No branches or pull requests

4 participants