Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying large files on GCP gives an error due to missing md5 hash #412

Open
bgorissen opened this issue May 22, 2024 · 0 comments
Open

Copying large files on GCP gives an error due to missing md5 hash #412

bgorissen opened this issue May 22, 2024 · 0 comments

Comments

@bgorissen
Copy link

bgorissen commented May 22, 2024

When copying data from drs to gs with drs.copy(), it copies files larger than 100 MB in chunks via _copy_multipart_passthrough. That function performs an md5 check:

if dst_blob.md5 != src_blob.md5:

For GCP, the md5 property is retrieved via:

    @property
    def md5(self) -> str:
        gs_md5 = self._get_native_blob().md5_hash
        return base64.b64decode(gs_md5).hex()

However, composite objects do not have the md5_hash property:

Composite objects do not have an MD5 hash metadata field.

Passing None to base64.b64decode results in an error. GCP still provides a crc32c hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant