Implement `copy_if_not_exist` for `AmazonS3` using DynamoDB (#4880) #4918

tustvold · 2023-10-11T10:38:31Z

Which issue does this PR close?

Rationale for this change

This crate aims to provide a uniform interface across a variety of different storage backends, facilitating the development of cloud-agnostic applications. Unfortunately the lack of support for conditional operations in S3 limits the scope of applications that this holds for, especially given how prevalent S3 is. It should be noted I am not aware of an S3-compatible store that doesn't natively support conditional operations

This PR therefore adds a simple DynamoDB-based synchronisation mechanism, which avoids the lowest-denominator problem we currently have. Whilst perhaps not optimal for all use-cases, having an out-of-the-box mechanism to support this I think will open up some exciting new possibilities for applications built on top of object_store.

What changes are included in this PR?

Are there any user-facing changes?

No

tustvold · 2023-10-11T10:39:10Z

object_store/src/aws/copy.rs

+    ///
+    /// ## Limitations
+    ///
+    /// Only conditional operations, e.g. `copy_if_not_exists` will be synchronized, and can


This language talks about conditional operations in general, as I intend to add a put_if_not_exists / create API call as part of #4879

tustvold · 2023-10-11T15:15:45Z

I intend to add a mechanism using etag for lock busting

Edit: nvm will save this for a follow up PR

alamb

It seems like the documentation may also need an update: https://docs.rs/object_store/latest/object_store/trait.ObjectStore.html#tymethod.copy_if_not_exists

I will try and find time to review this over the next few days

alamb

I started reading through this PR and so far it looks very nice @tustvold -- thank you for the contribution.

However, I think it needs much more testing than it currently has -- specifically for something a tricky as distributed locking, there should be tests covering all the TTL / expiry logic and the handling of multiple writers and error conditions

For example:

Have two tasks try to write concurrently, ensure one gets it
Have a task that dies and leaves the lock locked
Have a task actively writing, and then a second that is racing to update, etc
...

alamb · 2023-10-16T19:32:45Z

object_store/src/aws/client.rs

@@ -200,15 +198,14 @@ impl From<DeleteError> for Error {
 #[derive(Debug)]
 pub struct S3Config {
    pub region: String,
-    pub endpoint: String,
+    pub endpoint: Option<String>,


I double checked that this is not a (breaking) API change, as it changes the type of a pub field because the module is not pub:

arrow-rs/object_store/src/aws/mod.rs

Line 66 in 95b015c

mod client;

https://docs.rs/object_store/latest/object_store/struct.ObjectMeta.html?search=S3Config#structfield.e_tag

alamb · 2023-10-16T19:38:41Z

object_store/src/aws/copy.rs

+    ///
+    /// The major changes are:
+    ///
+    /// * Uses a monotonic generation count instead of a UUID rvn


A monotonic generation can have collisions between multiple client writers who are claiming expired locks, right? Why not use the UUID?

A UUID approach would still collide, that's the whole purpose 😄

A monotonic generation count is beneficial as it can also act as a fencing token, as a higher generation should win over a lower generation, admittedly this isn't used here but is generally good practice. Other less good reasons are a UUID is more expensive to generate, adds a dependency, and has a more expensive encoding.

I was thinking collision of concurrent writers (different processes)

DynamoDB will ensure only a single one succeeds at claiming the lock, as whichever wins will then increment the generation breaking the conditional expression of the other.

alamb · 2023-10-16T19:40:45Z

object_store/src/aws/copy.rs

+    ///
+    /// [TTL]: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html
+    /// [DynamoDB Lock Client]: https://aws.amazon.com/blogs/database/building-distributed-locks-with-the-dynamodb-lock-client/
+    Dynamo(String),


I recommend also adding in the parameters here for TTL, and LEASE_EXPIRY rather than hard coding it or having to make backwards incompatible changes in the future

alamb · 2023-10-16T19:44:35Z

object_store/src/aws/dynamo.rs

+/// The TTL offset to encode in DynamoDB
+///
+/// This should be significantly larger than [`LEASE_EXPIRY`] to allow for clock skew
+const LEASE_TTL: Duration = Duration::from_secs(60 * 60);


What is the rationale for such long timeouts?

The lease time to live is an hour which seems like a long time, especially given the lease expiry is for 60 seconds but the actual operation being performed (copy) likely takes much less.

I strongly suggest these values can be specified via configuration values rather than hard coded

Two reasons:

DynamoDB TTL is best effort, I think the only guarantee is it runs about once a day

You want to avoid any risk of it kicking in early as a result of clock skew

I'll try to encode these in comments

The problem with such a long timeout is it will render the object effectively read only for an hour on failure scenarios -- if this was used as a table provider or something similar, I think it would result in a substantial and prolonged outage

Oh no, this is just to clean up afterwards, the lease timeout is 60 seconds and is the longest anyone would have to wait in the event of a failure where a writer fails to persist following creating the lock

failure where a writer fails to persist following creating the lock

Right, exactly -- this is the scenario when the whole table would be locked -- it requires a failure at the "right" time but it would be very bad

this is the scenario when the whole table would be locked

For 60 seconds and then the other writer will detect a stale lock, and reclaim it? Is that very bad?

For 60 seconds and then the other writer will detect a stale lock, and reclaim it? Is that very bad?

No. I seem to be confused. was talking about whatever the 1 hour timeout is for.

The 1 hour timeout is just a garbage collection process that cleans out records that are no longer needed, think of it like a GC, it could be removed and the only impact would be a slightly larger DynamoDb storage bill. I reworded things in the latest PR to hopefully make this a bit clearer

Thank you for the clarification -- ~~perhaps this could be encoded in comments.~~

The new wording makes it clearer

alamb · 2023-10-17T21:34:45Z

object_store/src/aws/dynamo.rs

+    max_clock_skew_rate: u32,
+    /// The length of time a record will be retained in DynamoDB before being cleaned up
+    ///
+    /// This is purely an optimisation to avoid indefinite growth of the DynamoDB table


tustvold · 2023-10-27T17:51:17Z

object_store/src/aws/client.rs

@@ -58,30 +58,12 @@ const VERSION_HEADER: &str = "x-amz-version-id";
 #[derive(Debug, Snafu)]
 #[allow(missing_docs)]
 pub(crate) enum Error {
-    #[snafu(display("Error performing get request {}: {}", path, source))]


These variants simply acted as an indirection to source.error, see e4af137#diff-0feb099a0931110c214fa9b9c8c5528512f38ebf22448772a815baee5973b8c4L147

It is simpler to skip this indirection

tustvold · 2023-10-30T12:43:00Z

.github/workflows/object_store.yml

@@ -112,6 +112,7 @@ jobs:
      AWS_SECRET_ACCESS_KEY: test
      AWS_ENDPOINT: http://localhost:4566
      AWS_ALLOW_HTTP: true
+      AWS_COPY_IF_NOT_EXISTS: dynamo:test-table:2000


I originally wanted to use a much lower timeout, e.g. 100ms, but in CI this resulted in timeouts. Likely Github are significantly over-provisioning CPU on these instances, resulting in very poor predictability

tustvold · 2023-10-30T13:01:15Z

I think this is ready for review, I intend to follow this up with put_opts support and some more testing, but I want to avoid this PR getting too large

crepererum · 2023-11-02T09:58:22Z

object_store/src/aws/dynamo.rs

+    };
+
+    Some(Lease {
+        acquire: Instant::now(),


This instant should be measured BEFORE the dynamo HTTP request, not after. Otherwise your "clock skew" kinda includes HTTP roundtrip times.

This method is for extracting someone else's lease, i.e. for detecting lock timeouts, and therefore should be pessimistic, i.e. start measuring the timeout from after the response.

When acquiring a lease it uses an instant from before the request - https://github.com/apache/arrow-rs/pull/4918/files/d9aae852961a346181b2fbf26696cc9ae29185f0#diff-5a8585cdad662383c911426918d89de6d4b7ed13b975767c96ecac30d6e0c227R285

crepererum · 2023-11-02T10:07:35Z

object_store/src/aws/dynamo.rs

+            ("key", AttributeValue::String(key)),
+            ("generation", AttributeValue::Number(next_gen)),
+            ("timeout", AttributeValue::Number(self.timeout)),
+            ("ttl", AttributeValue::Number(ttl as _)),


Isn't this attribute name a user setting? According to https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-how-to.html the user has to set this for the table. If you expect the user to set this up (e.g. using the console), then you should probably document this on a higher level. Another option would be to set this up using the client (e.g. when the client is created or first used) via https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateTimeToLive.html.

It is documented here - https://github.com/apache/arrow-rs/pull/4918/files#diff-5a8585cdad662383c911426918d89de6d4b7ed13b975767c96ecac30d6e0c227R62

tustvold · 2023-12-22T18:39:10Z

object_store/src/aws/dynamo.rs

+            Err(e) => match parse_error_response(&e) {
+                Some(e) if e.error.ends_with(CONFLICT) => match extract_lease(&e.item) {
+                    Some(lease) => Ok(TryLockResult::Conflict(lease)),
+                    // ReturnValuesOnConditionCheckFailure is a relatively recent addition


An update here is that dynamodb-local does finally support this - https://aws.amazon.com/about-aws/whats-new/2023/12/amazon-dynamodb-local-two-api-features/

However, it would appear localstack still does not

tustvold · 2023-12-22T19:03:16Z

@roeap perhaps you might have time to give this a once over?

tustvold · 2023-12-22T19:03:46Z

object_store/src/aws/dynamo.rs

+/// * A numeric attribute named `"generation"`
+/// * A numeric attribute named `"timeout"`
+///
+/// To perform a conditional operation on an object with a given `path` and `etag` (if exists),


The etag functionality will be utilised in a follow up PR to add condition PUT

whenever we did copy_if_not exists, conditional PUT is the thing we actually wanted to do 😆

roeap

LGTM!

Did not have too much time to dive very deep, but there were some very nice previous comments already, which gave some confidence.

The critical parts - i.e. limitations and protential races seem to be all elaborated, at least I could not find more ...

roeap · 2023-12-22T20:04:21Z

object_store/src/aws/dynamo.rs

+/// * A numeric attribute named `"generation"`
+/// * A numeric attribute named `"timeout"`
+///
+/// To perform a conditional operation on an object with a given `path` and `etag` (if exists),


whenever we did copy_if_not exists, conditional PUT is the thing we actually wanted to do 😆

tustvold · 2023-12-23T09:20:32Z

I plan to merge this in a few days unless anybody objects

github-actions bot added the object-store Object Store Interface label Oct 11, 2023

tustvold commented Oct 11, 2023

View reviewed changes

tustvold marked this pull request as draft October 11, 2023 15:14

tustvold marked this pull request as ready for review October 11, 2023 15:26

alamb changed the title ~~DynamoDB Coordination for AmazonS3 (#4880)~~ Implement copy_if_not_exist for S3 using DynamoDB Coordination for AmazonS3 (#4880) Oct 11, 2023

alamb reviewed Oct 11, 2023

View reviewed changes

alamb changed the title ~~Implement copy_if_not_exist for S3 using DynamoDB Coordination for AmazonS3 (#4880)~~ Implement copy_if_not_exist for AmazonS3 using DynamoDB (#4880) Oct 11, 2023

alamb reviewed Oct 16, 2023

View reviewed changes

tustvold marked this pull request as draft October 17, 2023 16:55

alamb reviewed Oct 17, 2023

View reviewed changes

tustvold force-pushed the dynamodb-lock branch from 18b7c67 to b3e5f8e Compare October 27, 2023 17:32

tustvold commented Oct 27, 2023

View reviewed changes

tustvold force-pushed the dynamodb-lock branch 3 times, most recently from 3950d02 to f390863 Compare October 29, 2023 22:17

tustvold added 8 commits October 30, 2023 12:06

Implement DynamoDBLock (apache#4880)

dc6ede3

Cleanup error handling

eb6a328

Clippy

8b63797

Localstack support

1a73e38

Clippy

1ea8ed1

Handle integration test concurrency

4bbca2a

More docs

cbedff1

Disable request timeout

9ac05eb

tustvold force-pushed the dynamodb-lock branch from e55a7c2 to 9ac05eb Compare October 30, 2023 12:06

tustvold added 2 commits October 30, 2023 12:07

Fix merge conflicts

ee4f321

Reduce test concurrency

7bef442

tustvold force-pushed the dynamodb-lock branch from 76bc303 to 7bef442 Compare October 30, 2023 12:31

Increase timeouts

d9aae85

tustvold commented Oct 30, 2023

View reviewed changes

tustvold marked this pull request as ready for review October 30, 2023 13:00

tustvold mentioned this pull request Nov 1, 2023

Release object_store 0.8.0` #5010

Closed

2 tasks

crepererum reviewed Nov 2, 2023

View reviewed changes

Merge remote-tracking branch 'upstream/master' into dynamodb-lock

37bfb22

tustvold commented Dec 22, 2023

View reviewed changes

roeap reviewed Dec 22, 2023

View reviewed changes

roeap approved these changes Dec 22, 2023

View reviewed changes

tustvold mentioned this pull request Dec 23, 2023

Optimistic concurrency control lancedb/lance#951

Closed

6 tasks

tustvold merged commit 844b851 into apache:master Dec 26, 2023
31 checks passed

tustvold mentioned this pull request Dec 26, 2023

DynamoDB ConditionalPut #5247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `copy_if_not_exist` for `AmazonS3` using DynamoDB (#4880) #4918

Implement `copy_if_not_exist` for `AmazonS3` using DynamoDB (#4880) #4918

tustvold commented Oct 11, 2023

tustvold Oct 11, 2023

tustvold commented Oct 11, 2023 •

edited

Loading

alamb left a comment

alamb left a comment

alamb Oct 16, 2023

alamb Oct 16, 2023

tustvold Oct 16, 2023

alamb Oct 17, 2023

tustvold Oct 17, 2023 •

edited

Loading

alamb Oct 16, 2023

alamb Oct 16, 2023

tustvold Oct 16, 2023

alamb Oct 16, 2023

tustvold Oct 16, 2023 •

edited

Loading

alamb Oct 17, 2023

tustvold Oct 17, 2023 •

edited

Loading

alamb Oct 17, 2023

tustvold Oct 17, 2023 •

edited

Loading

alamb Oct 17, 2023 •

edited

Loading

alamb Oct 17, 2023

tustvold Oct 27, 2023

tustvold Oct 30, 2023

tustvold commented Oct 30, 2023

crepererum Nov 2, 2023

tustvold Nov 2, 2023 •

edited

Loading

crepererum Nov 2, 2023

tustvold Dec 22, 2023

tustvold Dec 22, 2023

tustvold commented Dec 22, 2023

tustvold Dec 22, 2023

roeap Dec 22, 2023

roeap left a comment

roeap Dec 22, 2023

tustvold commented Dec 23, 2023

Implement copy_if_not_exist for AmazonS3 using DynamoDB (#4880) #4918

Implement copy_if_not_exist for AmazonS3 using DynamoDB (#4880) #4918

Conversation

tustvold commented Oct 11, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Choose a reason for hiding this comment

tustvold commented Oct 11, 2023 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Oct 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

alamb Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Oct 30, 2023

Choose a reason for hiding this comment

tustvold Nov 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Dec 22, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roeap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tustvold commented Dec 23, 2023

Implement `copy_if_not_exist` for `AmazonS3` using DynamoDB (#4880) #4918

Implement `copy_if_not_exist` for `AmazonS3` using DynamoDB (#4880) #4918

tustvold commented Oct 11, 2023 •

edited

Loading

tustvold Oct 17, 2023 •

edited

Loading

tustvold Oct 16, 2023 •

edited

Loading

tustvold Oct 17, 2023 •

edited

Loading

tustvold Oct 17, 2023 •

edited

Loading

alamb Oct 17, 2023 •

edited

Loading

tustvold Nov 2, 2023 •

edited

Loading