Lossless File Compressor

C Implementation of a lossless file compression suite, using the Lempel-Ziv-Welch (LZW) algorithm, with on-par or better compression performance than the native UNIX compress utility.

Benchmarks (See Testing) suggest that for intermediate code sizes (MAXBITS = 11 to 13), this implementation achieves about 7-12% higher compression ratios at the expense of a significant increase in compression time.

For smaller or much larger code sizes, the compression ratios are roughly on par as compress, but still have a significant compute time penalty. To guard against this time penalty, string table pruning only occurs for MAXBITS > 10, which reduces computation time in half. Based on this testing, the recommended (and default) setting for MAXBITS is 12.

Lempel-Ziv-Welch (LZW) Algorithm

The LZW algorithm is a universal lossless compression algorithm that with very high throughput. It is used by the UNIX utility compress and uncompress as well as in GIFs. The algorithm works by constructing a string table of (code, prefix, and character) triples, which allows a long substring to be represented using a smaller code. Because the string table is conducted on the fly in both compression and decompression, the compressed file doesn't need to contain these mappings, allowing it to achieve high compression ratios.

Usage

From the root of the repository, run make with gcc installed to compile the source code into the executable binaries.

./compress [-m MAXBITS] < input > output\n
./decompress < input > output

MAXBITS is the largest number of bits a code can be represented with when compressing (defaults to 12). The string table can therefore never be larger than $2^{MAXBITS}$ entries. When decompressing, the MAXBITS flag isn't passed as the compressed file stores its value.

When the DBG environment variable is set to 1, compress and decompress will dump human readable versions of the final string tables to DBG.compress and DBG.decompress, respectively.

Example Usage

./compress < "foo.txt" > "compressed_foo.txt"
./decompress < "compressed_foo.txt" > "foo_copy.txt"

Testing

The repository includes the testing script as well as the files used for testing the compressor. Note that the testing script requires the ncompress utility to be installed, which may or not be natively installed. To run the test yourself, execute the following command from the root of the repository:

./tests/test_script.sh [[-m MAXBITS] [-d] [-b]]

The -d flag sets DBG=1 and the -b flag stops further tests after the first error.

The testing files come courtesy of the testing data for Snappy, a compressor/decompressor from Google. The files come from a variety of sources including the Canterbury Corpus.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
binaryIO.c		binaryIO.c
binaryIO.h		binaryIO.h
compress.c		compress.c
compress.h		compress.h
decompress.c		decompress.c
decompress.h		decompress.h
main.c		main.c
stack.c		stack.c
stack.h		stack.h
string_table.c		string_table.c
string_table.h		string_table.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lossless File Compressor

Lempel-Ziv-Welch (LZW) Algorithm

Usage

Example Usage

Testing

About

Releases

Packages

Languages

vmittal27/file-compressor

Folders and files

Latest commit

History

Repository files navigation

Lossless File Compressor

Lempel-Ziv-Welch (LZW) Algorithm

Usage

Example Usage

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages