Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementation for archiving a pack file #138

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

zhubonan
Copy link
Collaborator

Replaces #133

Archived pack files are essentially ZIP archives. Reading from these files are also supported from offset/length as stored in the sqlite database. Because archived packs will never be used for reading, they can be stored at different file systems and networked locations. The use of ZIP archives also allows recovering data in case of the sqlite database being damaged.

The main difference between an archived pack and a normal pack is that:

  1. An archived pack file is always compressed.
  2. Compression is done by DEFLATE, but the stream is slightly different from that of a normal pack file. This is because in normal pack file compressed streams contains zlib's header/trailer (WBITTS=15, default), while for a ZIP file the streams are "raw" adn does not contain headers/trailers (WBITS=-15).

Creating an archive is a slow process, and should be carried out while the container is not activet (e.g. similar to repack). However, I think it is should still be possible to carry out as long as the pack file being archived not being written into at the same time.

A new table is needed in the sqlite database to store the status of the pack file, with two extra columns: state and location. The former would be changed to Archived if the pack is archived. The latter stores any explicit location of the archived pack file.

A cli interface is provided to list archive files and update their locations.

Archive packs needs special handling.
1. They should not be selected for writing
2. They can reside on different locations
This allow an archived pack to be imported into other containers,
without the need of the SQLite database file.
Sometimes it is a string, sometimes it can be an int. sqlite does
not seem to care? But matching inside python differentiate the two.
This allows archive files to exist outside of the container folder.
Allow archive locations to be set/updated, alow added CLI
@codecov
Copy link

codecov bot commented May 30, 2022

Codecov Report

Merging #138 (bc62f1e) into develop (16e6ff9) will decrease coverage by 2.44%.
The diff coverage is 86.72%.

❗ Current head bc62f1e differs from pull request most recent head c7af5af. Consider uploading reports for the commit c7af5af to get more accurate results

@@             Coverage Diff             @@
##           develop     #138      +/-   ##
===========================================
- Coverage    99.52%   97.07%   -2.45%     
===========================================
  Files            8        8              
  Lines         1676     1881     +205     
===========================================
+ Hits          1668     1826     +158     
- Misses           8       55      +47     
Impacted Files Coverage Δ
disk_objectstore/cli.py 83.96% <56.75%> (-14.59%) ⬇️
disk_objectstore/utils.py 96.51% <91.66%> (-3.09%) ⬇️
disk_objectstore/container.py 97.95% <92.76%> (-1.45%) ⬇️
disk_objectstore/database.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 16e6ff9...c7af5af. Read the comment docs.

@zhubonan
Copy link
Collaborator Author

Hi @chrisjsewell @giovannipizzi, could you please take a look of this?

Some prblems still to be solved:

  1. sqlite database of the existing container needs to be migrated. I guess this should be done with alembic?
  2. For the tests on windows. There are errors when tryign to delete a file that is still opened (the sqlite database file). Is there anyway obvious thing that I can try to track it down?

Alow CompressMode.YES and CompressMode.NO to be used when repacking.
Previously, there was no way to change compression once an object
is stored in a packed file.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant