Replies: 1 comment
-
Duplicate of #3480 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Now the storage engine overwrites duplicate entries, that share the same row key by default. We may also need to support append mode, which never overwrites duplicate entries.
For data in memtable, the storage engine currently adds the index in batch to the row key to distinguish rows with the same row key and sequence in a batch, as rows in the same batch share the same sequence. But when flushing data in
memtable
to the SST, we ignore that index, so SST can contain data with same row key and sequence. At the time, SST's duplicate row keys and sequences can be filtered via the merge reader, but we need to implement a mechanism to remove duplicate entries from the SSTs to finally release the storage space, via compaction, and filtering duplicates during flush.I will share any new thoughts in this discussion thread.
Beta Was this translation helpful? Give feedback.
All reactions