Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to tar large files #14

Open
huw0 opened this issue Jun 14, 2022 · 0 comments
Open

Unable to tar large files #14

huw0 opened this issue Jun 14, 2022 · 0 comments

Comments

@huw0
Copy link

huw0 commented Jun 14, 2022

Hi,

Firstly, thanks! S3-tar has been really useful for archiving some of our buckets.

We have a number that contain a mix of small files and files that are larger than 50% of available RAM. When a large file is encountered the process is killed with an out of memory error. It'd be really great to resolve this.

From a cursory look at the code it seems that the underlying cause of this is the use of io.BytesIO() for in-memory processing both when downloading from S3 and creating parts of the tar, meaning that any files being processed need RAM to be > fileSize*2.

I think multiple algorithms are required depending on file size. Where there are small files, the current process makes sense as caching reduces the total time needed.

However when a large file is encountered, it is probably necessary to pipe directly from the s3 stream to tar and back to s3. This would mean that there will be a limit of one part uploading at a time.

Alternatively some intelligent spooling to disk could be used although this would have the same problem that the maximum file size supported will be limited by the disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant