Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation optimization - reducing the number of HEAD calls during cp and mv operations #4710

Merged
merged 10 commits into from
Oct 12, 2023

Conversation

zveinn
Copy link
Contributor

@zveinn zveinn commented Oct 10, 2023

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the [Apache 2 license] (https://www.apache.org/licenses/LICENSE-2.0).
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

This pull request provides performance optimizations for the mv and cp commands along with some minor cleanup and validation refactor.

Related Issues

#3149

Motivation and Context

During some cp and mv operations the mc client makes too many HEAD calls. These head calls were being made on multiple levels of execution so in order to reduce the overhead a sizable cleanup was required.

The Validation layer

The main offender here was the checkCopySyntax function. On Line 61 the function validates object existance, however, the objects existance is validate numerous times through out the process making this redundant. The next offender start on line 111, this is the initial syntax validation. This validation method calls for HEAD multiple times to validate the syntax.

Merging URL generation and Syntax Validation

The simplest way to reduce the number of HEAD calls is to no repeat that operation multiple times for the same object. This means passing already known information down the execution path. The simplest way to achieve this goal was to merge the validation and URL generation layer since the URL generation layer has it's own requirement on HEAD checking. It would not have been wise to merge the URL generation layer up into the validation layer since the logic behind the URL generation is much more complex.

This is an example trace of a file being moved from a local filesystem to a bucket

2023-10-10T08:20:35.958 [200 OK] s3.GetBucketLocation 192.168.2.10:9000/bb1/?location=  192.168.2.10     176µs       ⇣  165.972µs  ↑ 77 B ↓ 128 B
2023-10-10T08:20:35.959 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     6.663ms      ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:20:35.967 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     200µs       ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:20:35.967 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     209µs       ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:20:35.968 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     190µs       ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:20:35.969 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     174µs       ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:20:35.970 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     174µs       ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:20:35.971 [200 OK] s3.GetObject 192.168.2.10:9000/bb1/m 192.168.2.10     24.561ms     ⇣  1.507702ms  ↑ 86 B ↓ 25 MiB

Example of the same operation post refactor

2023-10-10T08:40:18.971 [200 OK] s3.GetBucketLocation 192.168.2.10:9000/bb1/?location=  192.168.2.10     241µs       ⇣  224.181µs  ↑ 77 B ↓ 128 B
2023-10-10T08:40:18.971 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     306µs       ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:40:18.972 [200 OK] s3.HeadObject 192.168.2.10:9000/bb1/m 192.168.2.10     280µs       ⇣  0s         ↑ 77 B ↓ 0 B
2023-10-10T08:40:18.973 [200 OK] s3.GetObject 192.168.2.10:9000/bb1/m 192.168.2.10     24.231ms     ⇣  1.320942ms  ↑ 86 B ↓ 25 MiB

How to test this PR?

There is a functional-tests.sh script in the root of the project.

NOTE: The test_admin_users functional test seems to be broken and I could not figure out why it's not working. It has nothing to do with my current changes and I suspect it's been broken for a while. I did refactor the test_admin_users test a bit and I opened a discussion on slack about it.

$ ./functional-tests.sh

Types of changes

  • Optimization (provides speedup with no functional changes)

Checklist:

  • Unit tests added/updated

Copy link
Contributor

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor things.

cmd/cp-main.go Show resolved Hide resolved
cmd/cp-url.go Outdated Show resolved Hide resolved
cmd/cp-main.go Show resolved Hide resolved
cmd/cp-url.go Outdated Show resolved Hide resolved
cmd/cp-url.go Outdated Show resolved Hide resolved
functional-tests.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zveinn zveinn changed the title Validation optimization Validation optimization - reducing the number of HEAD calls during mc operations Oct 12, 2023
@zveinn zveinn changed the title Validation optimization - reducing the number of HEAD calls during mc operations Validation optimization - reducing the number of HEAD calls during cp and mv operations Oct 12, 2023
@kannappanr kannappanr merged commit 87d6cd2 into minio:master Oct 12, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants