Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UUID support #1

Open
whilo opened this issue Apr 6, 2014 · 9 comments
Open

UUID support #1

whilo opened this issue Apr 6, 2014 · 9 comments

Comments

@whilo
Copy link

whilo commented Apr 6, 2014

Nice work! I have implemented platform-neutral cryptographic hashing for my distributed repository system and have just tried your compression. Nice that unicode works out of the box, but UUID is not supported. While UUID is very bad to compress, a lot of my message data contains them and it was nice if could just compress the whole message. Maybe you can add it (?) or I might do if I find the time...

Btw., have you compared shannon's effectiveness versus gzip compression of pr-str edn?

@hadronzoo
Copy link
Owner

Hello @ghubber, thanks! I would definitely like to support UUID types. I'm in the process of rewriting the library to be adaptive so the compressor learns your data structures as it sees them—therefore greatly improving compression rates. Until it is adaptive, you'll probably find that its compression is good for very small messages, but for larger messages gzip will perform much better. Alternatively, if you know the distribution of your data beforehand, you can use the library to construct a compressor that is within two bits of optimal given that distribution.

@hadronzoo
Copy link
Owner

By the way, we might be working on similar projects. I originally created this library to fit Clojurescript's immutable data-structures into a persistent KV-Store like Local Storage for distributed synchronization.

@whilo
Copy link
Author

whilo commented Apr 13, 2014

Interesting, you are right, besides compressing communication, using it in my storage layer would make sense as well. Local storage does not fit my requirements very well (the synching repository keeps all history and 5 mib will be too little for many cases (but not all...)), maybe we could corporate on an async storage protocol? On the most part I try to leverage immutability to break out values and share them whereever I can and use clojure.data/diff as deltas on the single metadata place per repo, still compression will help on keeping the communication overhead of synchronisation lower by at least some constant factor, so it will be a valuable optimization once the layer works realiably.

Automatic synching runs now on top of konserve (indexeddb is not yet deployed, but tested on repl in chromium and ff, WebSQL was nice for Android and Apple browsers next) for a reddit like bookmarking app. It is still lacking proper authentication and automatic load-balancing on (merge-)divergence to be safe for real-world usage, but it seems to work fairly well User: "eve" Password: "lisp", in the console are synching debug messages. Using it was already easier to prototype than a custom server-side connection, the server is a generic peer atm. and learns about new repos and values online through auto-subscription.

If you can give any feedback or help, this was appreciated, since this is a real stretch of my capabilities. One year + some months in Clojure and it still amazes me what I am able to build with limited experience XD.

@whilo
Copy link
Author

whilo commented Apr 17, 2014

Proxies blocked websockets, I found out, it now runs on: https://shelf.polyc0l0r.net/

@whilo
Copy link
Author

whilo commented Apr 27, 2014

Adaptive coding was nice, I have compressed testwise a jpeg as a byte-stream-vector with bytes-coder and it gave me a bit more than the original size. While this is already encoded and the encoding doesn't help here (it just achieves a binary format), having this properly encoded would bring my messaging overhead due to edn down in all cases (even these edge ones).
Since websockets can transfer binary data this would fit in there nicely. The storage can also compress and decompress on the fly if the overhead is worth it.
That way I could just carelessly commit all data in edn with the repository, while still having close to optimum bandwidth overhead although using edn string serialisation.

Do you think it is a lot of work to get compress adaptively working for arbitrary edn data-structures? What is missing?

@hadronzoo
Copy link
Owner

Yes, I think adaptive compression of arbitrary EDN, along with the ability to store encoder state, would be very cool. If you can reload your compressor state from a previous checkpoint, you could immediately start encoding close to entropy, making good compression for small, stateless messages possible.

I've been able to implement the first part, which is O(log n) Cumulative Frequency Tables. Next, I need to implement a way for subcomponents to update global encoder state, probably using something similar to Om's cursors. Finally, I need to come up with a good set of adaptive scaler encoders. I've been playing around with using Gaussians, but lossless float encoding is a bit tricky. I'd like to use distributions that are their own conjugate priors if possible.

BTW: I apologize for the delayed response. I attempted to login to your lambda-shelf prototype but received a 504 gateway timeout error.

@whilo
Copy link
Author

whilo commented May 30, 2014

Yes, stupid, we had a db corruption issue and then kept developing instead of first fixing the prototype and now it does not seem worth the effort compared to bringing the new one online (which is not good imo). Hopefully the new and more declarative prototype (with datascript) will work soon.

Ok, sounds reasonable for float encoding, I still need to watch the relevant videos for floats of mathematicalmonk. Any further recommendations? I am working on neural networks and gibbs sampling atm. so I am exposed to some statistics stuff and the information theory fits nicely.

You can implement general readers in edn btw., so you can extend your compression to tagged literals as well.

@whilo
Copy link
Author

whilo commented May 30, 2014

Are you on freenode #clojure btw.? My handle is whilo.

@whilo
Copy link
Author

whilo commented Jul 20, 2014

The site is now up again and quite a bit improved (mostly under the hood) https://topiq.es/. It is still slow and I need to refactor it atm., but compression could then just be added in a middleware like in ring.

Using a conjugate prior sounds reasonable, but I guess you would need a way to find an appropriate distribution for e.g. a vector of floats or the encoding was pretty bad. I need to watch more information theory videos of yours. My statistics knowledge is improving though and I find close to entropy (lossless or especially lossy) encoding of models (e.g. autoencoders like attractor-networks)/information fairly interesting.
What is your status?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants