Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a reason for use archive/tar like API instead of archive/zip? #4

Open
ajnavarro opened this issue Jun 9, 2020 · 2 comments

Comments

@ajnavarro
Copy link

First of all, congratulations and thanks for this library.

As far as I know, 7zip uses an index, like zip files.

Going through the code, we can see that we are loading all entries when we initialize the 7zip reader, so when we call Next(), we are just iterating through a slice to get the FileInfo:

go7z/reader.go

Lines 188 to 197 in 9c09b6b

func (sz *Reader) nextFileInfo() *headers.FileInfo {
var fileInfo *headers.FileInfo
if sz.fileIndex < len(sz.header.FilesInfo) {
fileInfo = sz.header.FilesInfo[sz.fileIndex]
sz.fileIndex++
return fileInfo
}
return nil
}

My question is, why do we need an iterator-like API if we know the entries beforehand? Maybe an archive/zip API would be better for this use case?:

zr, _ := go7z.NewReader(readerAt, size)
for _, f := range zr.Files {
    info := f.FileInfo()
    name := f.Name
    reader, _ := f.Open()
    ...
}

Sorry in advance if I missed some obvious problem here that makes this impossible. If you think it's a good idea, I'll be happy to help with the implementation.

@saracen
Copy link
Owner

saracen commented Jun 9, 2020

Hey,

I wrote this some time ago, so my memory around it is a little fuzzy. I still find the 7z archive format confusing.

I think the original rationale for the tar like interface was because my main use-case was for full archive extraction. Although the FileInfo is accessible, the data content is typically written to a compressed solid block (a folder in 7z terminology), compressed alongside other files. So random access to a file's content isn't as easy as zip. If you try to extract a specific file, and it happens to be at the end of a solid block, the whole block needs decompressing. The zip interface also allows you to Open() multiple files concurrently. With 7z, if you were to open several files within the same solid block, making sure you decompress them efficiently might be difficult.

Having said that, the current interface is somewhat broken. You're supposed to be able to "skip" a file (either jumping to the next solid block, or seek within the current solid block and decompress/discarding previous data), but for some archives this doesn't work.

A fresh pair of eyes on the code, interface and that bug would be great if you're interested in helping out!

@ajnavarro
Copy link
Author

Thanks a lot for the explanation.

Just to be more familiar with the codebase I tried to fix the skip file problem (just a workaround, to be able to iterate from there with some specific tests).

From here, I can have a look at the 7zip folder format and check how other libraries are handling skipping files that are into 7zip folders.

Again, thanks a lot for your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants