Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. #13

dsevero · 2021-08-11T01:42:14Z

Adaptive codecs

As mentioned in the title, this would allow adaptive codecs to be implemented in Craystack.

Signatures of push and pop are extended to allow for a *context variable.

    push: (ans_state, symbol, *context) -> (ans_state, *context)
    pop: (ans_state, *context) -> (ans_state, symbol, *context)

Note that, since context is passed via unpacking (i.e. *context), then it is essentially optional. Therefore, previous codecs are still compliant with this new signature.

However, the return of push will be at least (ans_state,), so function calls have been adapted accordingly. This means that instead of

message = codec.push(message, symbol)

we now have

message, = codec.push(message, symbol)

Multiset codec

The multiset codec from https://github.com/facebookresearch/multiset-compression was ported to craystack/codecs/multiset.py

Extra example

An extra example, showing how to use the multiset codec for BMNIST with BB-ANS, was added to craystack/examples.

dsevero · 2021-08-11T01:50:38Z

craystack/codecs_test.py

@@ -166,7 +166,7 @@ def test_categorical_new():
    rng = np.random.RandomState(2)
    precision = 4
    shape = (20, 3, 5)
-    weights = rng.random((np.prod(shape), 4))
+    weights = rng.random((np.prod(shape), 4)) + 1


The lack of the + 1 was sometimes causing the test to fail, as some weights would be quantized to 0 due to low precision.

dsevero · 2021-08-11T01:51:32Z

examples/binary_mnist_vae.py

@@ -85,4 +81,3 @@ def vae_view(head):
 print('All decoded in {:.2f}s.'.format(decode_t))

 np.testing.assert_equal(images, images_)
-np.testing.assert_equal(message, init_message)


I removed the message equality check, as it is incompatible with empty pop.

dsevero · 2021-08-11T01:52:09Z

examples/binary_mnist_vae.py

@@ -16,7 +17,7 @@

 num_images = 10000
 num_pixels = num_images * 784
-batch_size = 10


Changed to be comparable with the multiset codec. However, not really necessary.

README.md

craystack/codecs/multiset.py

j-towns · 2021-08-11T10:13:03Z

Very nice work. I'm actually wondering if we shouldn't go a bit further with the API change, and allow push and pop to be any inverse pair, i.e. instead of

    push: (ans_state, symbol, *context) |-> (ans_state, *context)
    pop: (ans_state, *context) |-> (ans_state, symbol, *context)

we go for

    push: before |-> after
    pop: after |-> before

There's a trade-off here — instead of having to unpack results, as in

message, = codec.push(message, symbol)

the 'inverses' approach would require packing inputs, as in

message = codec.push((message, symbol))

because each push/pop must have exactly one argument.

An advantage of the 'inverses' approach is that the codec combinators (things like serial and parallel) might be simpler, and it might reveal some more general forms. For example, what we now call repeat, might be replaced by an 'invertible reduce':

from functools import reduce

def ireduce(codec, length):
    # Assumes that
    #   codec.push: (a, b) |-> a
    #   codec.pop: a |-> (a, b)
    # and returns a codec for lists containing type b elements:
    #   push: (a, bs) |-> a
    #   pop: a |-> (a, bs)
    def push(a, bs):
        return reduce(codec.push, bs, a)

    def pop(a):
        bs = []
        for _ in range(length):
            a, b = codec.pop(a)
            bs.append(b)
            del b
        return bs
    return codec(push, pop)

This is a change we could make in a separate pr/series of prs.

j-towns · 2021-08-11T11:14:22Z

Here's another sketch: https://gist.github.com/j-towns/d8deee8d2bbfa4bb5ab93bc2573ee81d

dsevero · 2021-08-11T13:42:01Z

Maybe that could be a lower layer, and the codecs could be implemented on top? I'm worried that it might be a bit much for the regular user that just wants to do compression.

dsevero and others added 6 commits August 10, 2021 13:27

Fix CategoricalNew test

fc91b87

Create tests.yml

f7ee82c

Update README.md

f6c9606

Add initial support for context-adaptible codecs

1f2d592

Add multiset-codec

237a64c

Add binary mnist as multiset experiment

90fae31

dsevero changed the title ~~Extend Craystack API to allow adaptive codecs~~ Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. Aug 11, 2021

dsevero commented Aug 11, 2021

View reviewed changes

j-towns reviewed Aug 11, 2021

View reviewed changes

README.md Outdated Show resolved Hide resolved

j-towns reviewed Aug 11, 2021

View reviewed changes

craystack/codecs/multiset.py Show resolved Hide resolved

dsevero and others added 2 commits August 11, 2021 10:29

Update README.md

a2d10e1

Update license and remove initial bits codec from examples

e7bbb9b

j-towns merged commit 98f5803 into j-towns:master Aug 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. #13

Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. #13

dsevero commented Aug 11, 2021 •

edited

Loading

dsevero Aug 11, 2021

dsevero Aug 11, 2021

dsevero Aug 11, 2021

j-towns commented Aug 11, 2021

j-towns commented Aug 11, 2021

dsevero commented Aug 11, 2021

Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. #13

Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. #13

Conversation

dsevero commented Aug 11, 2021 • edited Loading

Adaptive codecs

Multiset codec

Extra example

dsevero Aug 11, 2021

Choose a reason for hiding this comment

dsevero Aug 11, 2021

Choose a reason for hiding this comment

dsevero Aug 11, 2021

Choose a reason for hiding this comment

j-towns commented Aug 11, 2021

j-towns commented Aug 11, 2021

dsevero commented Aug 11, 2021

dsevero commented Aug 11, 2021 •

edited

Loading