-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UUID support #1
Comments
Hello @ghubber, thanks! I would definitely like to support UUID types. I'm in the process of rewriting the library to be adaptive so the compressor learns your data structures as it sees them—therefore greatly improving compression rates. Until it is adaptive, you'll probably find that its compression is good for very small messages, but for larger messages gzip will perform much better. Alternatively, if you know the distribution of your data beforehand, you can use the library to construct a compressor that is within two bits of optimal given that distribution. |
By the way, we might be working on similar projects. I originally created this library to fit Clojurescript's immutable data-structures into a persistent KV-Store like Local Storage for distributed synchronization. |
Interesting, you are right, besides compressing communication, using it in my storage layer would make sense as well. Local storage does not fit my requirements very well (the synching repository keeps all history and 5 mib will be too little for many cases (but not all...)), maybe we could corporate on an async storage protocol? On the most part I try to leverage immutability to break out values and share them whereever I can and use clojure.data/diff as deltas on the single metadata place per repo, still compression will help on keeping the communication overhead of synchronisation lower by at least some constant factor, so it will be a valuable optimization once the layer works realiably. Automatic synching runs now on top of If you can give any feedback or help, this was appreciated, since this is a real stretch of my capabilities. One year + some months in Clojure and it still amazes me what I am able to build with limited experience XD. |
Proxies blocked websockets, I found out, it now runs on: https://shelf.polyc0l0r.net/ |
Adaptive coding was nice, I have Do you think it is a lot of work to get compress adaptively working for arbitrary edn data-structures? What is missing? |
Yes, I think adaptive compression of arbitrary EDN, along with the ability to store encoder state, would be very cool. If you can reload your compressor state from a previous checkpoint, you could immediately start encoding close to entropy, making good compression for small, stateless messages possible. I've been able to implement the first part, which is O(log n) Cumulative Frequency Tables. Next, I need to implement a way for subcomponents to update global encoder state, probably using something similar to Om's cursors. Finally, I need to come up with a good set of adaptive scaler encoders. I've been playing around with using Gaussians, but lossless float encoding is a bit tricky. I'd like to use distributions that are their own conjugate priors if possible. BTW: I apologize for the delayed response. I attempted to login to your lambda-shelf prototype but received a 504 gateway timeout error. |
Yes, stupid, we had a db corruption issue and then kept developing instead of first fixing the prototype and now it does not seem worth the effort compared to bringing the new one online (which is not good imo). Hopefully the new and more declarative prototype (with datascript) will work soon. Ok, sounds reasonable for float encoding, I still need to watch the relevant videos for floats of mathematicalmonk. Any further recommendations? I am working on neural networks and gibbs sampling atm. so I am exposed to some statistics stuff and the information theory fits nicely. You can implement general readers in edn btw., so you can extend your compression to tagged literals as well. |
Are you on freenode #clojure btw.? My handle is |
The site is now up again and quite a bit improved (mostly under the hood) https://topiq.es/. It is still slow and I need to refactor it atm., but compression could then just be added in a middleware like in Using a conjugate prior sounds reasonable, but I guess you would need a way to find an appropriate distribution for e.g. a vector of floats or the encoding was pretty bad. I need to watch more information theory videos of yours. My statistics knowledge is improving though and I find close to entropy (lossless or especially lossy) encoding of models (e.g. autoencoders like attractor-networks)/information fairly interesting. |
Nice work! I have implemented platform-neutral cryptographic hashing for my distributed repository system and have just tried your compression. Nice that unicode works out of the box, but UUID is not supported. While UUID is very bad to compress, a lot of my message data contains them and it was nice if could just compress the whole message. Maybe you can add it (?) or I might do if I find the time...
Btw., have you compared shannon's effectiveness versus gzip compression of pr-str edn?
The text was updated successfully, but these errors were encountered: