Behavior on storing duplicate entries #547

evenyag · 2022-11-16T15:31:23Z

evenyag
Nov 16, 2022
Maintainer

Now the storage engine overwrites duplicate entries, that share the same row key by default. We may also need to support append mode, which never overwrites duplicate entries.

For data in memtable, the storage engine currently adds the index in batch to the row key to distinguish rows with the same row key and sequence in a batch, as rows in the same batch share the same sequence. But when flushing data in memtable to the SST, we ignore that index, so SST can contain data with same row key and sequence. At the time, SST's duplicate row keys and sequences can be filtered via the merge reader, but we need to implement a mechanism to remove duplicate entries from the SSTs to finally release the storage space, via compaction, and filtering duplicates during flush.

I will share any new thoughts in this discussion thread.

evenyag · 2024-03-11T09:36:21Z

evenyag
Mar 11, 2024
Maintainer Author

Duplicate of #3480

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greptime

Behavior on storing duplicate entries #547

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Greptime

Behavior on storing duplicate entries #547

evenyag Nov 16, 2022 Maintainer

Replies: 1 comment

evenyag Mar 11, 2024 Maintainer Author

evenyag
Nov 16, 2022
Maintainer

evenyag
Mar 11, 2024
Maintainer Author