-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few changes to support complex pickle files #22
base: master
Are you sure you want to change the base?
Conversation
The final object should be based on the current instance, what the old code calls the "standin". Using the state to update this object should not touch `.ext`. As the FIXME notes, this probably still wrong as we should be updating the memoized instance by reference, not updating a cloned version of it. However, this is at least "more correct" and I need a testcase for that first.
Thanks for the PR! I'll have to delay the complete review for a bit, but I like the idea of compatibility with more complex pickles. One nitpick from first glance: the version bump should be in the minor version, since new APIs are added. |
Hey, thanks for taking a look! I don't like that test, it checks that something fails which now succeeds. Should people be relying on failure as part of the API? The code already has a "do the best we can" attitude, so I would say no, but let me know what you think. The version bump is just a quick hack. I noticed that cargo will fail to use a source override sometimes if the override and the crates.io one have the same version. There is one thing which I was not quite sure about, so I want to make sure it's clear. The new code tries to detect the type of an object by looking at the operations performed on it. But if it e.g. a subclass of This seems like a bit of a fundamental problem with the I actually have a followup PR which allows to specify the way to handle various types ahead of time in the options passed to the library. This avoids the ambiguity of auto-detection, but there are more complications and I'm not sure it's worth it. |
This PR's been open for a while and while I haven't tested these changes, support for recursive structures would be very useful for a data format I'm parsing. Edit: I actually just found that there must be a bug somewhere with the memo ref count. I took @rvolgers branch and did some experimenting -- I no longer get errors relating to recursive objects when I disable this condition (always keep the item in the memo): Line 603 in 13c0cd6
|
I was trying to load some pretty non-Rust-friendly pickle files with this crate (using the
Value
API) and, while I've mostly come to the conclusion that this crate is not really built for that, I've made some improvements which might be worth upstreaming. Consider this a request for comments on whether any of this is useful. I can split it up or make changes as needed.This adds options which enable loading pickle files with recursive data structures (through an option to return a
None
when recursing instead of causing an error) and to support custom subclasses of dict, list, and set.Custom subclasses of dict, list, and set are detected by looking for dict, list or set operations performed on an object. These objects are returned as a tuple
(object_attributes, superclass_data)
or, if you set an option, the object attributes will be discarded and only the superclass data will be returned. This option makes it possible to mostly treat (e.g.) subclasses of list as just lists, with the caveat that if the list is empty the object's attributes may be returned in the form of a dict instead, as we have no way to detect that an object is a list if no list operations are performed on it.Each commit should be mostly self-contained, except that the first commit (Decode collections.defaultdict as dict) is reverted and superseded by the commit after that, since the second commit contains a different, more general solution to the same problem. I kept it around in case the more tricky solution is rejected. (Edit: this was true before a number of followup bugfix commits. Instead of rewriting history I've left them as-is, but I can fix that if wanted.)