Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. #13

Merged
merged 8 commits into from
Aug 11, 2021

Conversation

dsevero
Copy link
Contributor

@dsevero dsevero commented Aug 11, 2021

Adaptive codecs

As mentioned in the title, this would allow adaptive codecs to be implemented in Craystack.

Signatures of push and pop are extended to allow for a *context variable.

    push: (ans_state, symbol, *context) -> (ans_state, *context)
    pop: (ans_state, *context) -> (ans_state, symbol, *context)

Note that, since context is passed via unpacking (i.e. *context), then it is essentially optional. Therefore, previous codecs are still compliant with this new signature.

However, the return of push will be at least (ans_state,), so function calls have been adapted accordingly. This means that instead of

message = codec.push(message, symbol)

we now have

message, = codec.push(message, symbol)

Multiset codec

The multiset codec from https://github.com/facebookresearch/multiset-compression was ported to craystack/codecs/multiset.py

Extra example

An extra example, showing how to use the multiset codec for BMNIST with BB-ANS, was added to craystack/examples.

@dsevero dsevero changed the title Extend Craystack API to allow adaptive codecs Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. Aug 11, 2021
@@ -166,7 +166,7 @@ def test_categorical_new():
rng = np.random.RandomState(2)
precision = 4
shape = (20, 3, 5)
weights = rng.random((np.prod(shape), 4))
weights = rng.random((np.prod(shape), 4)) + 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of the + 1 was sometimes causing the test to fail, as some weights would be quantized to 0 due to low precision.

@@ -85,4 +81,3 @@ def vae_view(head):
print('All decoded in {:.2f}s.'.format(decode_t))

np.testing.assert_equal(images, images_)
np.testing.assert_equal(message, init_message)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the message equality check, as it is incompatible with empty pop.

@@ -16,7 +17,7 @@

num_images = 10000
num_pixels = num_images * 784
batch_size = 10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to be comparable with the multiset codec. However, not really necessary.

@j-towns
Copy link
Owner

j-towns commented Aug 11, 2021

Very nice work. I'm actually wondering if we shouldn't go a bit further with the API change, and allow push and pop to be any inverse pair, i.e. instead of

    push: (ans_state, symbol, *context) |-> (ans_state, *context)
    pop: (ans_state, *context) |-> (ans_state, symbol, *context)

we go for

    push: before |-> after
    pop: after |-> before

There's a trade-off here — instead of having to unpack results, as in

message, = codec.push(message, symbol)

the 'inverses' approach would require packing inputs, as in

message = codec.push((message, symbol))

because each push/pop must have exactly one argument.

An advantage of the 'inverses' approach is that the codec combinators (things like serial and parallel) might be simpler, and it might reveal some more general forms. For example, what we now call repeat, might be replaced by an 'invertible reduce':

from functools import reduce

def ireduce(codec, length):
    # Assumes that
    #   codec.push: (a, b) |-> a
    #   codec.pop: a |-> (a, b)
    # and returns a codec for lists containing type b elements:
    #   push: (a, bs) |-> a
    #   pop: a |-> (a, bs)
    def push(a, bs):
        return reduce(codec.push, bs, a)

    def pop(a):
        bs = []
        for _ in range(length):
            a, b = codec.pop(a)
            bs.append(b)
            del b
        return bs
    return codec(push, pop)

This is a change we could make in a separate pr/series of prs.

@j-towns
Copy link
Owner

j-towns commented Aug 11, 2021

@dsevero
Copy link
Contributor Author

dsevero commented Aug 11, 2021

Maybe that could be a lower layer, and the codecs could be implemented on top? I'm worried that it might be a bit much for the regular user that just wants to do compression.

@j-towns j-towns merged commit 98f5803 into j-towns:master Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants