Unable to tar large files #14

huw0 · 2022-06-14T17:23:14Z

Hi,

Firstly, thanks! S3-tar has been really useful for archiving some of our buckets.

We have a number that contain a mix of small files and files that are larger than 50% of available RAM. When a large file is encountered the process is killed with an out of memory error. It'd be really great to resolve this.

From a cursory look at the code it seems that the underlying cause of this is the use of io.BytesIO() for in-memory processing both when downloading from S3 and creating parts of the tar, meaning that any files being processed need RAM to be > fileSize*2.

I think multiple algorithms are required depending on file size. Where there are small files, the current process makes sense as caching reduces the total time needed.

However when a large file is encountered, it is probably necessary to pipe directly from the s3 stream to tar and back to s3. This would mean that there will be a limit of one part uploading at a time.

Alternatively some intelligent spooling to disk could be used although this would have the same problem that the maximum file size supported will be limited by the disk.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to tar large files #14

Unable to tar large files #14

huw0 commented Jun 14, 2022

Unable to tar large files #14

Unable to tar large files #14

Comments

huw0 commented Jun 14, 2022