Skip to content

Commit

Permalink
Merge pull request #2 from duedil-ltd/feature/docs
Browse files Browse the repository at this point in the history
Documentation
  • Loading branch information
tarnfeld committed Dec 28, 2013
2 parents c577614 + da7a00b commit 8e54b8b
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 2 deletions.
44 changes: 43 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,46 @@ python-lzo-indexer

![](https://travis-ci.org/duedil-ltd/python-lzo-indexer.png)

Python library for indexing block offsets within LZO compressed files.
Python library for indexing block offsets within LZO compressed files. The implementation is largely based on that of the [Hadoop Library](https://github.com/twitter/hadoop-lzo). Index files are used to allow Hadoop to split a single file compressed with LZO into several chunks for parallel processing.

Since LZO is a block based compression algorithm, we can split the file along the lines of blocks and decompress each block on it's own. The index is a file containing byte offsets for each block in the original LZO file.


Example
-------

The python code below demonstrates how easy it is to index an LZO file. This library also supports indexing a string, and a method to return the individual block offsets should you need to create a file of your own format.

```python
import lzo_indexer

with open("my-file.lzo", "r") as f:
with open("my-file.lzo.index", "rw") as index:
lzo_indexer.index_lzo_file(f, index)
```


Command-line Utility
--------------------

This library also includes a utility for indexing multiple lzo files, using the python indexer. This is a much faster alternative to the command line utility built into the hadoop-lzo library as it avoids the JVM.

```
$ bin/lzo-indexer --help
usage: lzo-indexer [-h] [--verbose] [--force] lzo_files [lzo_files ...]
positional arguments:
lzo_files List of LZO files to index
optional arguments:
-h, --help show this help message and exit
--verbose, -v Enable verbose logging
--force, -f Force re-creation of an index even if it exists
```


Contributions
-------------

I welcome any contributions, though I request that any pull requests come with test coverage.
2 changes: 1 addition & 1 deletion bin/lzo-indexer
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def parse_args(argv):
parser.add_argument("--verbose", "-v", default=False, action="store_true",
help="Enable verbose logging")
parser.add_argument("--force", "-f", default=False, action="store_true",
help="Force re-creation of an index even if it exsts")
help="Force re-creation of an index even if it exists")
parser.add_argument("lzo_files", type=str, nargs="+",
help="List of LZO files to index")

Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,6 @@ def read(filename):
download_url="https://github.com/duedil-ltd/python-lzo-indexer/archive/release-0.0.1.zip",
license=read("LICENSE"),
packages=["lzo_indexer"],
scripts=["bin/lzo-indexer"],
test_suite="tests.test_indexer",
)

0 comments on commit 8e54b8b

Please sign in to comment.