Replies: 7 comments
-
I don't have a solution but I'd like to leave a note. zip archive is non solid compression archive. Each files are compressed to its own blocks. 7z archive is solid archive in default. All data blocks are concatenated and compressed into a single solid block. https://en.wikipedia.org/wiki/Solid_compression If you want a file-object to target archived file, py7zr need to extract all archives before the specified file to make a status just before extracting target file. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the answer 😃 |
Beta Was this translation helpful? Give feedback.
-
You are right. |
Beta Was this translation helpful? Give feedback.
-
Thank you. That's all I wanted to know. 🤓 |
Beta Was this translation helpful? Give feedback.
-
I wanted to read a huge 7z file as well. The following seems to work if you can process the data in chunks: import py7zr
path = "test.7z"
with py7zr.SevenZipFile(path, "r") as z:
for f in z.files:
if f.is_directory: continue
folder = f.folder
decompressor = folder.get_decompressor(f.compressed)
remaining = f.uncompressed
while remaining > 0:
chunk = decompressor.decompress(z.fp, remaining)
# Do something with "chunk" of "f.filename" here,
# like send to a server or something.
remaining -= len(chunk)
if remaining <= 0:
break
else:
print(f"decompressing: {remaining * 1e-6:16.6f} MB of {f.filename} remaining") Alternatively, many APIs are happy with a file object that only supports import py7zr, shutil, os
path = "test.7z"
class ReadOnlyFile:
def __init__(self, f, z):
folder = f.folder
decompressor = folder.get_decompressor(f.compressed)
self.decompressor = decompressor
self.remaining = f.uncompressed
self.z = z
self.chunk = b""
self.offset = 0
def read(self, size=-1):
if self.remaining <= 0: return b""
# Buffer a new chunk if current chunk is empty.
if self.offset >= len(self.chunk):
self.chunk = self.decompressor.decompress(self.z.fp, self.remaining)
self.offset = 0
if size < 0:
# Return entire chunk if caller does not care about size.
chunk = self.chunk
self.chunk = b""
else:
# Return as much of chunk as is available.
available = min(size, len(self.chunk) - self.offset)
chunk = self.chunk[self.offset : self.offset + available]
self.offset += available
self.remaining -= available
return chunk
# Example to unzip all files in a 7z file.
with py7zr.SevenZipFile(path, "r") as z:
for f in z.files:
if f.is_directory: continue
src = ReadOnlyFile(f, z)
assert ".." not in f.filename
d = os.path.dirname(f.filename)
if d:
os.makedirs(d, exist_ok=True)
# Do something with the src file object, e.g. copy it to an output file.
with open(f.filename, "wb") as dst:
shutil.copyfileobj(src, dst) |
Beta Was this translation helpful? Give feedback.
-
@99991 Unfortunately 7-zip file format is designed to place compression metadata information at end of file. Anyone can not know a chunk positions, where LZMA block is started and end, before downloading all the file data. |
Beta Was this translation helpful? Give feedback.
-
@miurahr I think we are talking about different problems here. My goal was to extract the files from an already downloaded 7z archive in a streaming manner, since I still had enough space to store the compressed archive, but not the uncompressed archive. Your description sounds like decompressing the data while it is being downloaded. This is a harder problem, but I think it would still be possible in some cases, since many HTTP servers support range requests. This way you can specify that you only want to download a certain range of bytes; just the end, for example. I implemented something like that for Zip files, which also store the offsets at the end of the file. With |
Beta Was this translation helpful? Give feedback.
-
Hey guys 🤓
This is my solution for
.zip
format.Want to do the same for
7z
.I just need a way to get a fileobj.
Why? To avoid RAM or storage overheads and unpack straight into S3.
Any possibility to do so?
Beta Was this translation helpful? Give feedback.
All reactions