Is there a reason for use `archive/tar` like API instead of `archive/zip`? #4

ajnavarro · 2020-06-09T09:30:29Z

First of all, congratulations and thanks for this library.

As far as I know, 7zip uses an index, like zip files.

Going through the code, we can see that we are loading all entries when we initialize the 7zip reader, so when we call Next(), we are just iterating through a slice to get the FileInfo:

go7z/reader.go

Lines 188 to 197 in 9c09b6b

    
           func (sz *Reader) nextFileInfo() *headers.FileInfo { 
        
           	var fileInfo *headers.FileInfo 
        
           	if sz.fileIndex < len(sz.header.FilesInfo) { 
        
           		fileInfo = sz.header.FilesInfo[sz.fileIndex] 
        
           		sz.fileIndex++ 
        
           		return fileInfo 
        
           	} 
        
           	return nil 
        
           }

My question is, why do we need an iterator-like API if we know the entries beforehand? Maybe an archive/zip API would be better for this use case?:

zr, _ := go7z.NewReader(readerAt, size)
for _, f := range zr.Files {
    info := f.FileInfo()
    name := f.Name
    reader, _ := f.Open()
    ...
}

Sorry in advance if I missed some obvious problem here that makes this impossible. If you think it's a good idea, I'll be happy to help with the implementation.

The text was updated successfully, but these errors were encountered:

saracen · 2020-06-09T23:46:27Z

Hey,

I wrote this some time ago, so my memory around it is a little fuzzy. I still find the 7z archive format confusing.

I think the original rationale for the tar like interface was because my main use-case was for full archive extraction. Although the FileInfo is accessible, the data content is typically written to a compressed solid block (a folder in 7z terminology), compressed alongside other files. So random access to a file's content isn't as easy as zip. If you try to extract a specific file, and it happens to be at the end of a solid block, the whole block needs decompressing. The zip interface also allows you to Open() multiple files concurrently. With 7z, if you were to open several files within the same solid block, making sure you decompress them efficiently might be difficult.

Having said that, the current interface is somewhat broken. You're supposed to be able to "skip" a file (either jumping to the next solid block, or seek within the current solid block and decompress/discarding previous data), but for some archives this doesn't work.

A fresh pair of eyes on the code, interface and that bug would be great if you're interested in helping out!

ajnavarro · 2020-06-10T11:12:53Z

Thanks a lot for the explanation.

Just to be more familiar with the codebase I tried to fix the skip file problem (just a workaround, to be able to iterate from there with some specific tests).

From here, I can have a look at the 7zip folder format and check how other libraries are handling skipping files that are into 7zip folders.

Again, thanks a lot for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a reason for use `archive/tar` like API instead of `archive/zip`? #4

Is there a reason for use `archive/tar` like API instead of `archive/zip`? #4

ajnavarro commented Jun 9, 2020

saracen commented Jun 9, 2020

ajnavarro commented Jun 10, 2020

Is there a reason for use archive/tar like API instead of archive/zip? #4

Is there a reason for use archive/tar like API instead of archive/zip? #4

Comments

ajnavarro commented Jun 9, 2020

saracen commented Jun 9, 2020

ajnavarro commented Jun 10, 2020

Is there a reason for use `archive/tar` like API instead of `archive/zip`? #4

Is there a reason for use `archive/tar` like API instead of `archive/zip`? #4