Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM/RFC] Track dependents without weakrefs #831

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Feb 1, 2024

@phofl this is an implementation that would track the "expression graph" on the instances themselves without needing weakref semantics.

In a nutshell, the expressions all track a dictionary that maps the names of the expresions to the names of their dependencies. When one needs the dependents, this can be reverted.

I haven't tested performance on this thing yet. The instance creation is a little more expensive now but I would be surprised if this would be a significant overhead. The dependents graph inversion should be the same cost as the weakref implementation we have right now.

Tests are currently failing because the DelayedExpr thingy is a little broken but the rest passes. I'll look at performance next and will see if this can be easily combined with #798 This Draft PR is just an FYI

Comment on lines 64 to 65
def __hash__(self):
raise TypeError("Don't!")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Funny" story: The above implementation is maintaining two dicts. One that maps names and dependencies and one that just maps names to the actual objects. Initially I just tried to use the Expr objects themselves in the mappings using a mapping from Expr -> set[Expr}. That triggered funny recursion errors and it took me a while to understand that...

Whenever there is a hash collision, the implementation of set falls back to use __eq__ to compare the new object with the one that is stored under a given hash, i.e. object.__eq__(old, new). However, this doesn't evaluate to a bool but defines another expression... 💥

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definite __eq__ is pretty hazardous in Python. It would be reasonable, I think, to drop Expr.__eq__ and use explicit Eq(a, b) calls instead, and then rely on Frame.__eq__ for user comfort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather, use whatever class is defined in collections.py to hold the __eq__ method

dask_expr/_core.py Outdated Show resolved Hide resolved
@phofl
Copy link
Collaborator

phofl commented Feb 1, 2024

I have to take a closer look, but I like the solution. When I tried the same I only stored the dependents on the expression which caused me trouble while recreating the mapping. Simply building the mapping continuously and storing it on the expression should circumvent the issues I ran into.

Thanks for taking a stab at this

@fjetter fjetter mentioned this pull request Feb 1, 2024
@@ -640,7 +641,7 @@ def test_set_index_sort_values_one_partition(pdf):

def test_set_index_triggers_calc_when_accessing_divisions(pdf, df):
divisions_lru.data = OrderedDict()
query = df.set_index("x")
query = df.fillna(random.randint(1, 100)).set_index("x")
Copy link
Collaborator

@phofl phofl Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is funny, the previous expression was cached, so we were never calculating divisions with this PR. Using a random number here avoid this

@phofl
Copy link
Collaborator

phofl commented Feb 14, 2024

Fix for failing tests is here: #880

Will merge into this branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants