Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EarthbeamDAG.upload_to_s3 does not check for naming conflicts #70

Open
jayckaiser opened this issue Sep 6, 2024 · 1 comment
Open

Comments

@jayckaiser
Copy link
Collaborator

In the (rare) case where multiple input files are uploaded to S3 with the same name, they will overwrite one another. This can occur when processing Parquet files, where each file ends up with a name like part.N.parquet.

We need to update the code to check whether files are identically named and to append additional metadata to their names to prevent collisions in this instance.

@jayckaiser
Copy link
Collaborator Author

I've created branch hotfix/include_env_var_in_s3_filepaths as a first pass at this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant