Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datasegment copier interface and s3 impl #17430

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jtuglu-netflix
Copy link
Contributor

@jtuglu-netflix jtuglu-netflix commented Oct 28, 2024

This PR creates a DataSegmentCopier interface, and corresponding S3DataSegmentCopier implementation. The goal here is to provide an alternative for those wishing to move datasegments around between clusters. These classes are used in a CLI tool for copying datasources between clusters that was similar to the older, now-deprecated migration tool and plan to release that to open-source soon as well.

This also adds the ability for these transfer tools to move segments larger than 5GB using an S3 Transfer Manager.

Description

Currently, Druid only provides a means of moving (deleting from the source) a datasegment from one deep storage location to another. This adds flexibility to copy instead, while refactoring the code between S3DataSegmentMover and S3DataSegmentCopier to be shared inside S3DataSegmentTransferUtility.

Release note

  • Allows data segments larger than 5GB to be copied in S3.
  • Adds an S3DataSegmentCopier class for ad-hoc use

Key changed/added classes in this PR
  • extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentCopier.java
  • extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentMover.java
  • extensions-core/s3-extensions/src/main/java/org/apache/druid/storage/s3/S3DataSegmentTransferUtility.java
  • extensions-core/s3-extensions/src/test/java/org/apache/druid/storage/s3/S3DataSegmentCopierTest.java
  • processing/src/main/java/org/apache/druid/segment/loading/DataSegmentCopier.java

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu-netflix jtuglu-netflix force-pushed the add-datasegment-copier-interface-and-s3-impl branch 2 times, most recently from a2534fb to 49c7941 Compare October 28, 2024 22:14
@jtuglu-netflix jtuglu-netflix marked this pull request as ready for review October 28, 2024 22:14
@kfaraz
Copy link
Contributor

kfaraz commented Oct 29, 2024

Thanks for the PR @jtuglu-netflix !
Could you share some details on how you plan to use this feature?

);
}
catch (Exception e) {
Throwables.propagateIfInstanceOf(e, AmazonServiceException.class);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Throwables.propagateIfInstanceOf
should be avoided because it has been deprecated.
}
catch (Exception e) {
Throwables.propagateIfInstanceOf(e, AmazonServiceException.class);
Throwables.propagateIfInstanceOf(e, SegmentLoadingException.class);

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note

Invoking
Throwables.propagateIfInstanceOf
should be avoided because it has been deprecated.

MockAmazonS3Client()
{
super(new AmazonS3Client(), new NoopServerSideEncryption());

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
AmazonS3Client.AmazonS3Client
should be avoided because it has been deprecated.
@jtuglu-netflix jtuglu-netflix force-pushed the add-datasegment-copier-interface-and-s3-impl branch from 1effb4e to d46076f Compare November 20, 2024 07:52
@jtuglu-netflix jtuglu-netflix marked this pull request as ready for review November 20, 2024 22:41
@jtuglu-netflix jtuglu-netflix force-pushed the add-datasegment-copier-interface-and-s3-impl branch from f9a693e to 24aa7c8 Compare November 20, 2024 22:57
Additionally allow S3DataSegmentMover and S3DataSegmentCopier to copy segments over 5GB.
@jtuglu-netflix jtuglu-netflix force-pushed the add-datasegment-copier-interface-and-s3-impl branch from 24aa7c8 to 22baf57 Compare November 21, 2024 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants