You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We don't support concurrent independent writes to the same table. We can move to an optimistic concurrency control model, as used by other table formats like Iceberg and Delta Lake. This means concurrent writes will finish writing data files first, then attempt to commit their manifest file.
Part 1: Commits
The current commit process doesn't check if there already exists a manifest of the current version. We need this check to be atomic with the write operation. This can be accomplished immediately with most object stores other than S3, since they support put-if-not-exist or something similar.
As a first step, we will fail if there is a concurrent write. Next, we need writers to decide what to do when another writer has written a commit before them. They must choose between:
Just increment the version and write the next manifest file. This is appropriate for appends, or if transactions overwrite disjoint partitions.
Rerun or partially rerun the query, and try committing again. For example, if we deleted data, but the winning write appends new data, we can rerun the delete for just the new data files and then try committing again, rather than having to rerun the full delete.
Fail. This can occur if the transactions are totally incompatible. For example, if we are updating a column that the winning transaction removed from the table.
Move fragment ids to be generated as UUIDs
Move timestamps to be at UTC instead of system time
Implement conflict resolution logic to allow certain concurrent writes (such as append) to succeed
Implement optimized retries for special cases of conflicting writes
The text was updated successfully, but these errors were encountered:
We don't support concurrent independent writes to the same table. We can move to an optimistic concurrency control model, as used by other table formats like Iceberg and Delta Lake. This means concurrent writes will finish writing data files first, then attempt to commit their manifest file.
Part 1: Commits
The current commit process doesn't check if there already exists a manifest of the current version. We need this check to be atomic with the write operation. This can be accomplished immediately with most object stores other than S3, since they support put-if-not-exist or something similar.
We will have to create a bespoke solution for S3.
Part 2: Conflict resolution
As a first step, we will fail if there is a concurrent write. Next, we need writers to decide what to do when another writer has written a commit before them. They must choose between:
The text was updated successfully, but these errors were encountered: