-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClickHouse Cloud Uploads: Appending duplicate CSV file doesn't append new values #237
Comments
I think that's how deduplication works. If you create a table in CH Cloud like this: CREATE TABLE foo
(
`i` UInt32,
`s` String
)
ENGINE = MergeTree
ORDER BY i; And then insert the same thing two times:
The result will be: SELECT * FROM foo FORMAT JSONEachRow; {"i":1,"s":"q"} I think this is related: https://clickhouse.com/blog/common-getting-started-issues-with-clickhouse#5-deduplication-at-insert-time |
Got it. So to allow users to upload duplicate data, we'd have to create a unique column. We can do that with an auto-incrementing integer column or UUID column. I'm now thinking this is a good reason to use UUIDs for the auto-generated primary key column |
Is uploading duplicates over a particular (small) time window a popular use case? Additionally, you can, of course, try to disable https://clickhouse.com/docs/en/operations/settings/merge-tree-settings#replicated-deduplication-window. |
To enable duplicate inserts, if that is a popular use-case and is necessary, you could do the following during the table creation: CREATE TABLE foo
(
`i` UInt32,
`s` String
)
ENGINE = MergeTree
ORDER BY i
SETTINGS replicated_deduplication_window=0 |
Bug with #236
If you append the same CSV file to a table that you have uploaded to the table previously, new rows will not be inserted into the table.
Steps to reproduce:
The text was updated successfully, but these errors were encountered: