-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patsy pickles #104
base: master
Are you sure you want to change the base?
patsy pickles #104
Conversation
Oh... and if you care to see the test-suite results... I ran it here... Some small things fail here, but I think it works overall. Namely there is a requirement that 2 generated Sorry, no coverage available. |
patsy/categorical.py
Outdated
|
||
def __setstate__(self, pickle): | ||
version, data, contrast, levels = pickle | ||
check_pickle_version(version, 0, name=self.__class__.__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to check the pickle version before unpacking it – e.g. if we ever added a 5th field to the pickle state here, then this would error out on the unpacking line, before getting to the check_pickle_version
line. Basically we have to commit to never change anything that happens before we look at the version, so the first line should be something like
check_pickle_version(pickle[0], 0, ...) # if we commit to using a (version, ...) tuple forever
or
check_pickle_version(pickle["version"], 0, ...)
if we decide maybe we want to be stuck with something more readable than a tuple ;-)
patsy/desc.py
Outdated
def __eq__(self, other): | ||
return self.__dict__ == other.__dict__ | ||
|
||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this class is using the default __getstate__
...?
patsy/desc.py
Outdated
@@ -209,7 +234,7 @@ def _pretty_repr_(self, p, cycle): # pragma: no cover | |||
[self.intercept, self.intercept_origin, | |||
self.intercept_removed, self.terms]) | |||
|
|||
__getstate__ = no_pickling | |||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does IntermediateExpr
need to support pickling? I think it should only exist as a temporary object in memory?
@@ -120,7 +121,17 @@ def __repr__(self): | |||
kwlist.append(("categories", self.categories)) | |||
repr_pretty_impl(p, self, [], kwlist) | |||
|
|||
__getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is also using the default __getstate__
?
patsy/design_info.py
Outdated
def __eq__(self, other): | ||
return self.__dict__ == other.__dict__ | ||
|
||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise...
patsy/eval.py
Outdated
@@ -10,7 +10,7 @@ | |||
# for __future__ flags! | |||
|
|||
# These are made available in the patsy.* namespace | |||
__all__ = ["EvalEnvironment", "EvalFactor"] | |||
__all__ = ["EvalEnvironment", "EvalFactor", "VarLookupDict"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this add VarLookupDict
to patsy's public API?
patsy/infix_parser.py
Outdated
@@ -44,7 +44,7 @@ def __init__(self, print_as): | |||
def __repr__(self): | |||
return "%s(%r)" % (self.__class__.__name__, self._print_as) | |||
|
|||
__getstate__ = no_pickling | |||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to support pickling?
patsy/infix_parser.py
Outdated
@@ -70,7 +70,7 @@ def _repr_pretty_(self, p, cycle): | |||
kwargs = [("extra", self.extra)] | |||
return repr_pretty_impl(p, self, [self.type, self.origin], kwargs) | |||
|
|||
__getstate__ = no_pickling | |||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to support pickling?
patsy/infix_parser.py
Outdated
@@ -83,7 +83,7 @@ def __init__(self, type, token, args, origin): | |||
def _repr_pretty_(self, p, cycle): | |||
return repr_pretty_impl(p, self, [self.type, self.token, self.args]) | |||
|
|||
__getstate__ = no_pickling | |||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise...
patsy/infix_parser.py
Outdated
@@ -95,14 +95,14 @@ def __repr__(self): | |||
return "%s(%r, %r, %r)" % (self.__class__.__name__, | |||
self.token_type, self.arity, self.precedence) | |||
|
|||
__getstate__ = no_pickling | |||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise...
patsy/infix_parser.py
Outdated
|
||
class _StackOperator(object): | ||
def __init__(self, op, token): | ||
self.op = op | ||
self.token = token | ||
|
||
__getstate__ = no_pickling | ||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise...
patsy/infix_parser.py
Outdated
@@ -115,7 +115,7 @@ def __init__(self, unary_ops, binary_ops, atomic_types, trace): | |||
self.atomic_types = atomic_types | |||
self.trace = trace | |||
|
|||
__getstate__ = no_pickling | |||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise...
# We reimplement patsy.util.no_pickling, to avoid circular import issues | ||
def __getstate__(self): | ||
raise NotImplementedError | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs an actual __getstate__
patsy/redundancy.py
Outdated
@@ -73,7 +73,7 @@ def __repr__(self): | |||
suffix = "-" | |||
return "%r%s" % (self.factor, suffix) | |||
|
|||
__getstate__ = no_pickling | |||
# __getstate__ = no_pickling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to support pickling?
patsy/state.py
Outdated
|
||
class StatefulTransform(object): | ||
def __getstate__(self): | ||
return (0, self.__dict__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm. This is cheating :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of a saying from my childhood hero
patsy/test_pickling.py
Outdated
"evalenv_transform_standardize": EvalEnvironment([{ | ||
'standardize': standardize | ||
}]), | ||
"evalenv_transform_catgorical": EvalEnvironment([{'C': C}]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "e" in "categorical"
patsy/test_pickling.py
Outdated
from patsy.origin import Origin | ||
|
||
|
||
PICKE_TESTCASES_ROOTDIR = os.path.join(os.path.dirname(__file__), '..', 'pickle_testcases') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "L" in "PICKLE"
patsy/util.py
Outdated
error_msg += "." | ||
|
||
# TODO Use a better exception than ValueError. | ||
raise ValueError(error_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could use PickleError
General comments:
|
But overall: this isn't ready yet, but it's very exciting to see someone working on it :-) |
Thanks. I'll try and take a look at this tomorrow. I think you caught a few of the shortcuts I was trying... I'll clean up as appropriate. As the the Thanks for your excellent comments. Hopefully I'll have an update tomorrow afternoon-ish. |
5588119
to
14e62a7
Compare
Update for today: https://travis-ci.org/thequackdaddy/patsy/builds/219481856 (all green) I've gone through/reverted your code as you recommended. Thanks for your pointers. I haven't figured out why stateful transform function like But this seems do-able. |
This has been very educational. Thanks for the help. I took a stab at fixing the issue where the patsy stateful transforms required Here's a gist showing that I think this works. |
That looks like it will work, but I think it should be possible to solve in a more fundamental way. Looking at And then if you look a few lines after we compute the set of variables to keep, then there's some code that rewrites the code to convert stateful transform calls like So I think the best solution is to swap the order of these things: if we make the list of variables to keep after eliminating references to stateful transforms, then it should just work. I know this is getting into somewhat deep wizardry... I'm going to try to look at this more closely within the next few days, and I can fix this part up if you don't get there first :-) |
58e672f
to
1b5f352
Compare
@njsmith Hello again. Just pinging to see if there's interest in this... would be a little helpful for me to cut down on some model sizes. |
c4a8bb9
to
7129507
Compare
Codecov Report
@@ Coverage Diff @@
## master #104 +/- ##
==========================================
- Coverage 99.08% 98.43% -0.66%
==========================================
Files 30 31 +1
Lines 5557 5928 +371
Branches 773 822 +49
==========================================
+ Hits 5506 5835 +329
- Misses 28 63 +35
- Partials 23 30 +7
Continue to review full report at Codecov.
|
@njsmith - Checking in again to see if this is on the right track/something that could be merged. Let me know either way. Thanks so much! |
@thequackdaddy @njsmith |
@dswah I'm waiting for review/feedback... This is definitely a big project, and part of it--such as the whole If you want to read/through, provide feedback, I think that might be helpful. |
… results afterwards.
…__ as appropriate to make assert tests work.
…d special tests for transform equivalence, new assert statement that makes the pickle a dict (not the most robust... but it works), added PickleError,
…l transforms which have different names/qualnames from expected.
* Now run subset and clean the eval_env after removing bare funcalls * Had some straggling tuples * Altered test so that stateful_transforms won't be saved in eval_env * Removed the clean method on EvalEnvironment
7129507
to
b9f69d1
Compare
b07ba3f
to
48fd2e4
Compare
Hi @thequackdaddy , Thanks for these patches. As one of the new maintainers I'm going through the PRs and figuring out what still needs attention. Even though your patches have been sitting here for along time, the code-base hasn't change much and so it's likely they are still suitable. Is this something you are still interested in hashing out? Given that the name of the game from a maintenance perspective is to keep things working as is, I'm a little cautious to introduce massive changes that might introduce new bugs. It should also be noted that https://github.com/matthewwardrop/formulaic (the spiritual successor of this project) already has support for pickling, as well as sparse representations and is much more performant. But if you'd like to explore finishing up this work, I'm willing to dig into it with you. Best, |
4f8a70c
to
691eb4e
Compare
Hello,
This is a continuation effort of #86 and #67. My apologies if this is viewed as stepping on someone else's toes... but this is an interesting--and useful topic--for me.
I've basically got this working (with some simple tests on real data), but some things break some of the unit tests.
One problem is that as far as I can tell, doing a
pickle.loads(pickle.dumps(obj))
will yield a different object, assigned to different memory than the originalobj
. Thus, the typical round-trip testobj == pickle.loads(pickle.dumps(obj))
fails, unless you just check that theobj.__dict__
matches. However, that breaks some things like... this lineI've also run across some issues with the stateful transforms like
center
,standardize
, etc. These are functions (and from my rudimentary knowledge of computer science, pickling functions is generally a no-no). I need to set the__qualname__
and__name__
of these functions because when you generally do...But when you
pickle.dump
, it realizes thatCenter
refers to the class, andcenter
is a function. So I have to hard-change the__qualname__
to get this to work. See here. This seems undesireable.Anyways, let me know if there's any interest in this work. I'm by no means an expert at this... and the test cases could obviously be improved. But I'm not a programmer by trade, so please grant me some forbearance.
Thanks!