-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Craystack API allowing adaptive codecs, add Multiset codec, and extra example. #13
Conversation
@@ -166,7 +166,7 @@ def test_categorical_new(): | |||
rng = np.random.RandomState(2) | |||
precision = 4 | |||
shape = (20, 3, 5) | |||
weights = rng.random((np.prod(shape), 4)) | |||
weights = rng.random((np.prod(shape), 4)) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of the + 1
was sometimes causing the test to fail, as some weights would be quantized to 0
due to low precision.
@@ -85,4 +81,3 @@ def vae_view(head): | |||
print('All decoded in {:.2f}s.'.format(decode_t)) | |||
|
|||
np.testing.assert_equal(images, images_) | |||
np.testing.assert_equal(message, init_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the message equality check, as it is incompatible with empty pop.
@@ -16,7 +17,7 @@ | |||
|
|||
num_images = 10000 | |||
num_pixels = num_images * 784 | |||
batch_size = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to be comparable with the multiset codec. However, not really necessary.
Very nice work. I'm actually wondering if we shouldn't go a bit further with the API change, and allow push and pop to be any inverse pair, i.e. instead of
we go for
There's a trade-off here — instead of having to unpack results, as in message, = codec.push(message, symbol) the 'inverses' approach would require packing inputs, as in message = codec.push((message, symbol)) because each push/pop must have exactly one argument. An advantage of the 'inverses' approach is that the codec combinators (things like from functools import reduce
def ireduce(codec, length):
# Assumes that
# codec.push: (a, b) |-> a
# codec.pop: a |-> (a, b)
# and returns a codec for lists containing type b elements:
# push: (a, bs) |-> a
# pop: a |-> (a, bs)
def push(a, bs):
return reduce(codec.push, bs, a)
def pop(a):
bs = []
for _ in range(length):
a, b = codec.pop(a)
bs.append(b)
del b
return bs
return codec(push, pop) This is a change we could make in a separate pr/series of prs. |
Here's another sketch: https://gist.github.com/j-towns/d8deee8d2bbfa4bb5ab93bc2573ee81d |
Maybe that could be a lower layer, and the codecs could be implemented on top? I'm worried that it might be a bit much for the regular user that just wants to do compression. |
Adaptive codecs
As mentioned in the title, this would allow adaptive codecs to be implemented in Craystack.
Signatures of push and pop are extended to allow for a
*context
variable.Note that, since context is passed via unpacking (i.e. *context), then it is essentially optional. Therefore, previous codecs are still compliant with this new signature.
However, the return of
push
will be at least(ans_state,)
, so function calls have been adapted accordingly. This means that instead ofwe now have
Multiset codec
The multiset codec from https://github.com/facebookresearch/multiset-compression was ported to
craystack/codecs/multiset.py
Extra example
An extra example, showing how to use the multiset codec for BMNIST with BB-ANS, was added to
craystack/examples
.