Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Read files content lazily [joelanford edits] #16

Closed

Conversation

joelanford
Copy link
Contributor

Based somewhat on #15 (comment), this extra commit is what I was thinking.

Benchmarks before:

$ go test -bench=.
goos: darwin
goarch: arm64
pkg: github.com/nlepage/go-tarfs/benchmarks
BenchmarkOpenTarThenCopyFile_ManySmallFiles-10                        20          57586710 ns/op
BenchmarkOpenTarThenCopyFile_ManySmallFiles_InMemory-10               21          50357206 ns/op
BenchmarkCopyFile_ManySmallFiles-10                                  123           9690994 ns/op
BenchmarkCopyFile_ManySmallFiles_InMemory-10                     4774060               253.8 ns/op
PASS
ok      github.com/nlepage/go-tarfs/benchmarks  6.694s

Benchmarks after:

$ go test -bench=.                
goos: darwin
goarch: arm64
pkg: github.com/nlepage/go-tarfs/benchmarks
BenchmarkOpenTarThenCopyFile_ManySmallFiles-10                        25          44993760 ns/op
BenchmarkOpenTarThenCopyFile_ManySmallFiles_InMemory-10               21          51123639 ns/op
BenchmarkCopyFile_ManySmallFiles-10                              1411468               855.9 ns/op
BenchmarkCopyFile_ManySmallFiles_InMemory-10                     4746981               249.9 ns/op
PASS
ok      github.com/nlepage/go-tarfs/benchmarks  6.612s

@joelanford joelanford changed the title ✨ Read files content lazily [jlanford edits] ✨ Read files content lazily [joelanford edits] Jul 22, 2023
@joelanford
Copy link
Contributor Author

Added another commit, which simplifies code further and seems to make some small gains in the benchmark:

$ go test -bench=.
goos: darwin
goarch: arm64
pkg: github.com/nlepage/go-tarfs/benchmarks
BenchmarkOpenTarThenCopyFile_ManySmallFiles-10                        25          44698328 ns/op
BenchmarkOpenTarThenCopyFile_ManySmallFiles_InMemory-10               28          40716345 ns/op
BenchmarkCopyFile_ManySmallFiles-10                              1400550               851.6 ns/op
BenchmarkCopyFile_ManySmallFiles_InMemory-10                     5445105               223.0 ns/op
PASS
ok      github.com/nlepage/go-tarfs/benchmarks  7.500s

@nlepage
Copy link
Owner

nlepage commented Jul 23, 2023

Hi @joelanford
This looks promising!
I'll have look tomorrow.

@nlepage
Copy link
Owner

nlepage commented Jul 24, 2023

I've had the time to read this, it's awesome!

Is it OK with you if I merge your commits in the upstream PR, and then add some tests and perform some minor refactorings?

@joelanford
Copy link
Contributor Author

Thanks! Yeah go for it. And feel free to squash my commits down, rewrite messages, or whatever make sense and is to your liking!

@nlepage nlepage force-pushed the feature/lazy-read branch 6 times, most recently from 0c608b0 to 407e63e Compare July 27, 2023 14:02
nlepage and others added 8 commits July 27, 2023 16:14
use the offset to avoid iterating tar file when opening each file.

Signed-off-by: Joe Lanford <[email protected]>
after we've read the non-readerAt reader into a buffer, we now have a
readerAt either way, which simplifies code and no longer requires
separate buffers per file entry.

Signed-off-by: Joe Lanford <[email protected]>
Reading file content straight from the underlying reader was too straightforward.

Instead read from a new tar.Reader with an underlying reader pointing straight to the tar entry we want to read.
Should improve initial read of the tar file.
@joelanford
Copy link
Contributor Author

joelanford commented Aug 4, 2023

Updated before and after benchmarks:

Before:

$ go test ./benchmarks -bench=. -benchmem
goos: darwin
goarch: arm64
pkg: github.com/nlepage/go-tarfs/benchmarks
BenchmarkOpenTarThenReadFile_ManySmallFiles-10                22          48194477 ns/op         8479395 B/op     163485 allocs/op
BenchmarkOpenTarThenReadFile_FewLargeFiles-10                261           4552260 ns/op        45079848 B/op        185 allocs/op
BenchmarkReadFile_ManySmallFiles-10                       259962              4424 ns/op            9338 B/op         18 allocs/op
BenchmarkReadFile_FewLargeFiles-10                           265           4455064 ns/op        45071809 B/op         21 allocs/op
PASS
ok      github.com/nlepage/go-tarfs/benchmarks  7.300s

After:

$ go test ./... -bench=. -benchmem     
PASS
ok      github.com/nlepage/go-tarfs     0.182s
goos: darwin
goarch: arm64
pkg: github.com/nlepage/go-tarfs/benchmarks
BenchmarkOpenTarThenReadFile_ManySmallFiles-10                        24          49196785 ns/op         8486693 B/op     163489 allocs/op
BenchmarkOpenTarThenReadFile_ManySmallFiles_DisableSeek-10            24          48368255 ns/op         8483226 B/op     163492 allocs/op
BenchmarkOpenTarThenReadFile_FewLargeFiles-10                        261           4588587 ns/op        45079274 B/op        186 allocs/op
BenchmarkOpenTarThenReadFile_FewLargeFiles_DisableSeek-10            255           4602195 ns/op        45078945 B/op        186 allocs/op
BenchmarkReadFile_ManySmallFiles-10                               251312              4671 ns/op            9338 B/op         18 allocs/op
BenchmarkReadFile_ManySmallFiles_DisableSeek-10                   245274              4458 ns/op            9339 B/op         18 allocs/op
BenchmarkReadFile_FewLargeFiles-10                                   268           4477002 ns/op        45071588 B/op         21 allocs/op
BenchmarkReadFile_FewLargeFiles_DisableSeek-10                       256           4564893 ns/op        45072093 B/op         21 allocs/op
BenchmarkOpenReadAndCloseFile_ManySmallFiles-10                   173931              6636 ns/op           17662 B/op         21 allocs/op
BenchmarkOpenReadAndCloseFile_ManySmallFiles_DisableSeek-10       251323              4706 ns/op            9339 B/op         18 allocs/op
BenchmarkOpenReadAndCloseFile_FewLargeFiles-10                       168           7026883 ns/op        90137151 B/op         23 allocs/op
BenchmarkOpenReadAndCloseFile_FewLargeFiles_DisableSeek-10           266           4411056 ns/op        45070941 B/op         21 allocs/op
BenchmarkOpenAndCloseFile_ManySmallFiles-10                       253801              4750 ns/op            9453 B/op         20 allocs/op
BenchmarkOpenAndCloseFile_ManySmallFiles_DisableSeek-10           498770              2330 ns/op            1123 B/op         17 allocs/op
BenchmarkOpenAndCloseFile_FewLargeFiles-10                           267           4450177 ns/op        45072038 B/op         23 allocs/op
BenchmarkOpenAndCloseFile_FewLargeFiles_DisableSeek-10            497299              2317 ns/op            1170 B/op         17 allocs/op
PASS
ok      github.com/nlepage/go-tarfs/benchmarks  25.324s

Note the new BenchmarkOpenReadAndCloseFile and BenchmarkOpenAndCloseFile benchmarks which use tfs.Open() and other fs.File operations rather than call fs.ReadFile. When reading, you can see that the DisableSeek variant uses half the memory and is generally faster as well! And when just opening and closing, the memory allocations are essentially constant

fs.go Outdated
Comment on lines 28 to 32
func DisableSeek(disable bool) func(*options) {
return func(o *options) {
o.disableSeek = disable
}
}
Copy link
Contributor Author

@joelanford joelanford Aug 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sorta feel like non-seekable files should be the default because it is more performant, but that would be a breaking change for anyone using the library with http.FS.

Since this library is at v1, I decided to make this a non-breaking change, which means callers would have to specifically opt in to the more performant behavior if they have no need to seek. But maybe it is worth considering bumping to v2?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants