Boundary package-merge algorithm Common Lisp implementation

The boundary package-merge algorithm is an algorithm for building length-limited Huffman codes [1]. For an alphabet of n symbols and with a constraint of maximum length L for the encoded messages, this algorithm has O(n.L) time complexity and O(L^2) space complexity.

This is better than the package-merge algorithm on which it is based [2]. The package-merge algorithm is O(n.L) time and O(n) space.

I made some significant improvements to the algorithm, that cut the multiplicative constants in space and time. I am still looking for a better algorithm in the literature.

Usage

Let's say you want to encode the n=6 symbol frequencies (1 1 5 7 10 14) (need to be sorted in increasing order) with maximum length of L=4.

(defparameter *probs* #(1 1 5 7 10 14))
(defparameter *L* 4)
(defparameter *a* (bpm:encode-limited *probs* *L*))
;;=> #(2 3 6 6)
(defparameter *lengths* (bpm:a-to-l *a*))
;;=> #(4 4 3 2 2 2)

*a* is a compressed representation of *l* (see [1]). *l* is the vector of lengths for the encoded symbols. The first has 4 bits, the second 4, the third 3 etc. A Huffman code can be built trivially from *l*.

Algorithm presentation

The understanding of the full mathematical proof is not required to understand the boundary package-merge algorithm.

The example from [1], with more colors and details, is:

Please read [1] for the full explanation. It is not an easy read but I don't think I can do a better job at explaining the algorithm.

Improvements to the algorithm

The boundary package-merge algorithm can be improved in minor ways to increase performance. I took the benchmark case of encoding 100k symbols of random weights to quantify the improvements. None of these changes modifies the algorithmic complexity of the algorithm.

Computation of chain pairs in the sets preceding j only if needed. The algorithm in [1] implemented in a literal way involves asking for nodes that will possibly never be considered for insertion as packages. The performance impact is almost a division by 2 in execution time for our test case. It seems that implementing this change necessarily comes at the cost of adding an auxiliary array of booleans of length L - 1 to keep track of whether new pairs are needed. This cost is negligeable.
Keeping weights around is actually unnecessary. We only need one number per set j. We need only the last weight of each set, replaced by the sum s of the weights of the two last elements that are about to become a package.
Only one in two chains is needed actually: the second of each package. This requires tracking the last count and last tail of each set j.

Memory/Time impact

We plot the memory usage of the lisp image across the different improvements we made. Except for the last series, this represent the total cumulated memory usage. The default behaviour of SBCL garbage collection being to be called only when the memory is actually full.

asking pairs: Performance after the improvement #1 above.
half chains: Idem after #2 and #3.
garbage collected: Manual garbage collection chains-gc implemented and called at every package linked to the last set of chains.
closures: Creating an adequate lexical environment for the recursion to run in. Execution time almost cut in half.
agressive gc: We call (sb-ext:gc :full t) agressively throughout the execution to check that the memory is actually free in the lisp image.

Dependencies

None. We use https://github.com/thomashoullier/mem-monitor for plotting the memory usage.

References

Katajainen J., Moffat A., Turpin A. (1995) A fast and space-economical algorithm for length-limited coding. In: Staples J., Eades P., Katoh N., Moffat A. (eds) Algorithms and Computations. ISAAC 1995. Lecture Notes in Computer Science, vol 1004. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015404
Larmore, Lawrence L., and Daniel S. Hirschberg. "A fast algorithm for optimal length-limited Huffman codes." Journal of the ACM (JACM) 37.3 (1990): 464-473. https://doi.org/10.1145/79147.79150

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Boundary package-merge algorithm Common Lisp implementation

Usage

Recommended reading

Algorithm presentation

Improvements to the algorithm

Memory/Time impact

Dependencies

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Boundary package-merge algorithm Common Lisp implementation

Usage

Recommended reading

Algorithm presentation

Improvements to the algorithm

Memory/Time impact

Dependencies

References