Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement copy_if_not_exist for AmazonS3 using DynamoDB (#4880) #4918

Merged
merged 12 commits into from
Dec 26, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #4880

Rationale for this change

This crate aims to provide a uniform interface across a variety of different storage backends, facilitating the development of cloud-agnostic applications. Unfortunately the lack of support for conditional operations in S3 limits the scope of applications that this holds for, especially given how prevalent S3 is. It should be noted I am not aware of an S3-compatible store that doesn't natively support conditional operations

This PR therefore adds a simple DynamoDB-based synchronisation mechanism, which avoids the lowest-denominator problem we currently have. Whilst perhaps not optimal for all use-cases, having an out-of-the-box mechanism to support this I think will open up some exciting new possibilities for applications built on top of object_store.

What changes are included in this PR?

Are there any user-facing changes?

No

@github-actions github-actions bot added the object-store Object Store Interface label Oct 11, 2023
///
/// ## Limitations
///
/// Only conditional operations, e.g. `copy_if_not_exists` will be synchronized, and can
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This language talks about conditional operations in general, as I intend to add a put_if_not_exists / create API call as part of #4879

@tustvold tustvold marked this pull request as draft October 11, 2023 15:14
@tustvold
Copy link
Contributor Author

tustvold commented Oct 11, 2023

I intend to add a mechanism using etag for lock busting

Edit: nvm will save this for a follow up PR

@tustvold tustvold marked this pull request as ready for review October 11, 2023 15:26
@alamb alamb changed the title DynamoDB Coordination for AmazonS3 (#4880) Implement copy_if_not_exist for S3 using DynamoDB Coordination for AmazonS3 (#4880) Oct 11, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the documentation may also need an update: https://docs.rs/object_store/latest/object_store/trait.ObjectStore.html#tymethod.copy_if_not_exists

I will try and find time to review this over the next few days

@alamb alamb changed the title Implement copy_if_not_exist for S3 using DynamoDB Coordination for AmazonS3 (#4880) Implement copy_if_not_exist for AmazonS3 using DynamoDB (#4880) Oct 11, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reading through this PR and so far it looks very nice @tustvold -- thank you for the contribution.

However, I think it needs much more testing than it currently has -- specifically for something a tricky as distributed locking, there should be tests covering all the TTL / expiry logic and the handling of multiple writers and error conditions

For example:

  1. Have two tasks try to write concurrently, ensure one gets it
  2. Have a task that dies and leaves the lock locked
  3. Have a task actively writing, and then a second that is racing to update, etc
    ...

@@ -200,15 +198,14 @@ impl From<DeleteError> for Error {
#[derive(Debug)]
pub struct S3Config {
pub region: String,
pub endpoint: String,
pub endpoint: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that this is not a (breaking) API change, as it changes the type of a pub field because the module is not pub:

https://docs.rs/object_store/latest/object_store/struct.ObjectMeta.html?search=S3Config#structfield.e_tag

///
/// The major changes are:
///
/// * Uses a monotonic generation count instead of a UUID rvn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A monotonic generation can have collisions between multiple client writers who are claiming expired locks, right? Why not use the UUID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A UUID approach would still collide, that's the whole purpose 😄

A monotonic generation count is beneficial as it can also act as a fencing token, as a higher generation should win over a lower generation, admittedly this isn't used here but is generally good practice. Other less good reasons are a UUID is more expensive to generate, adds a dependency, and has a more expensive encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking collision of concurrent writers (different processes)

Copy link
Contributor Author

@tustvold tustvold Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DynamoDB will ensure only a single one succeeds at claiming the lock, as whichever wins will then increment the generation breaking the conditional expression of the other.

///
/// [TTL]: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/howitworks-ttl.html
/// [DynamoDB Lock Client]: https://aws.amazon.com/blogs/database/building-distributed-locks-with-the-dynamodb-lock-client/
Dynamo(String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend also adding in the parameters here for TTL, and LEASE_EXPIRY rather than hard coding it or having to make backwards incompatible changes in the future

/// The TTL offset to encode in DynamoDB
///
/// This should be significantly larger than [`LEASE_EXPIRY`] to allow for clock skew
const LEASE_TTL: Duration = Duration::from_secs(60 * 60);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale for such long timeouts?

The lease time to live is an hour which seems like a long time, especially given the lease expiry is for 60 seconds but the actual operation being performed (copy) likely takes much less.

I strongly suggest these values can be specified via configuration values rather than hard coded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two reasons:

  • DynamoDB TTL is best effort, I think the only guarantee is it runs about once a day
  • You want to avoid any risk of it kicking in early as a result of clock skew

I'll try to encode these in comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with such a long timeout is it will render the object effectively read only for an hour on failure scenarios -- if this was used as a table provider or something similar, I think it would result in a substantial and prolonged outage

Copy link
Contributor Author

@tustvold tustvold Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, this is just to clean up afterwards, the lease timeout is 60 seconds and is the longest anyone would have to wait in the event of a failure where a writer fails to persist following creating the lock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failure where a writer fails to persist following creating the lock

Right, exactly -- this is the scenario when the whole table would be locked -- it requires a failure at the "right" time but it would be very bad

Copy link
Contributor Author

@tustvold tustvold Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the scenario when the whole table would be locked

For 60 seconds and then the other writer will detect a stale lock, and reclaim it? Is that very bad?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 60 seconds and then the other writer will detect a stale lock, and reclaim it? Is that very bad?

No. I seem to be confused. was talking about whatever the 1 hour timeout is for.

Copy link
Contributor Author

@tustvold tustvold Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1 hour timeout is just a garbage collection process that cleans out records that are no longer needed, think of it like a GC, it could be removed and the only impact would be a slightly larger DynamoDb storage bill. I reworded things in the latest PR to hopefully make this a bit clearer

Copy link
Contributor

@alamb alamb Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification -- perhaps this could be encoded in comments.

The new wording makes it clearer

@tustvold tustvold marked this pull request as draft October 17, 2023 16:55
max_clock_skew_rate: u32,
/// The length of time a record will be retained in DynamoDB before being cleaned up
///
/// This is purely an optimisation to avoid indefinite growth of the DynamoDB table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -58,30 +58,12 @@ const VERSION_HEADER: &str = "x-amz-version-id";
#[derive(Debug, Snafu)]
#[allow(missing_docs)]
pub(crate) enum Error {
#[snafu(display("Error performing get request {}: {}", path, source))]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variants simply acted as an indirection to source.error, see e4af137#diff-0feb099a0931110c214fa9b9c8c5528512f38ebf22448772a815baee5973b8c4L147

It is simpler to skip this indirection

@tustvold tustvold force-pushed the dynamodb-lock branch 3 times, most recently from 3950d02 to f390863 Compare October 29, 2023 22:17
@@ -112,6 +112,7 @@ jobs:
AWS_SECRET_ACCESS_KEY: test
AWS_ENDPOINT: http://localhost:4566
AWS_ALLOW_HTTP: true
AWS_COPY_IF_NOT_EXISTS: dynamo:test-table:2000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wanted to use a much lower timeout, e.g. 100ms, but in CI this resulted in timeouts. Likely Github are significantly over-provisioning CPU on these instances, resulting in very poor predictability

@tustvold tustvold marked this pull request as ready for review October 30, 2023 13:00
@tustvold
Copy link
Contributor Author

I think this is ready for review, I intend to follow this up with put_opts support and some more testing, but I want to avoid this PR getting too large

@tustvold tustvold mentioned this pull request Nov 1, 2023
2 tasks
};

Some(Lease {
acquire: Instant::now(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This instant should be measured BEFORE the dynamo HTTP request, not after. Otherwise your "clock skew" kinda includes HTTP roundtrip times.

Copy link
Contributor Author

@tustvold tustvold Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is for extracting someone else's lease, i.e. for detecting lock timeouts, and therefore should be pessimistic, i.e. start measuring the timeout from after the response.

When acquiring a lease it uses an instant from before the request - https://github.com/apache/arrow-rs/pull/4918/files/d9aae852961a346181b2fbf26696cc9ae29185f0#diff-5a8585cdad662383c911426918d89de6d4b7ed13b975767c96ecac30d6e0c227R285

("key", AttributeValue::String(key)),
("generation", AttributeValue::Number(next_gen)),
("timeout", AttributeValue::Number(self.timeout)),
("ttl", AttributeValue::Number(ttl as _)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this attribute name a user setting? According to https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/time-to-live-ttl-how-to.html the user has to set this for the table. If you expect the user to set this up (e.g. using the console), then you should probably document this on a higher level. Another option would be to set this up using the client (e.g. when the client is created or first used) via https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateTimeToLive.html.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err(e) => match parse_error_response(&e) {
Some(e) if e.error.ends_with(CONFLICT) => match extract_lease(&e.item) {
Some(lease) => Ok(TryLockResult::Conflict(lease)),
// ReturnValuesOnConditionCheckFailure is a relatively recent addition
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An update here is that dynamodb-local does finally support this - https://aws.amazon.com/about-aws/whats-new/2023/12/amazon-dynamodb-local-two-api-features/

However, it would appear localstack still does not

@tustvold
Copy link
Contributor Author

@roeap perhaps you might have time to give this a once over?

/// * A numeric attribute named `"generation"`
/// * A numeric attribute named `"timeout"`
///
/// To perform a conditional operation on an object with a given `path` and `etag` (if exists),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The etag functionality will be utilised in a follow up PR to add condition PUT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever we did copy_if_not exists, conditional PUT is the thing we actually wanted to do 😆

Copy link
Contributor

@roeap roeap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Did not have too much time to dive very deep, but there were some very nice previous comments already, which gave some confidence.

The critical parts - i.e. limitations and protential races seem to be all elaborated, at least I could not find more ...

/// * A numeric attribute named `"generation"`
/// * A numeric attribute named `"timeout"`
///
/// To perform a conditional operation on an object with a given `path` and `etag` (if exists),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whenever we did copy_if_not exists, conditional PUT is the thing we actually wanted to do 😆

@tustvold
Copy link
Contributor Author

I plan to merge this in a few days unless anybody objects

@tustvold tustvold merged commit 844b851 into apache:master Dec 26, 2023
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add copy_if_not_exists support for AmazonS3 via DynamoDB Lock Support
4 participants