[DNM/RFC] Track dependents without weakrefs #831

fjetter · 2024-02-01T16:28:40Z

@phofl this is an implementation that would track the "expression graph" on the instances themselves without needing weakref semantics.

In a nutshell, the expressions all track a dictionary that maps the names of the expresions to the names of their dependencies. When one needs the dependents, this can be reverted.

I haven't tested performance on this thing yet. The instance creation is a little more expensive now but I would be surprised if this would be a significant overhead. The dependents graph inversion should be the same cost as the weakref implementation we have right now.

Tests are currently failing because the DelayedExpr thingy is a little broken but the rest passes. I'll look at performance next and will see if this can be easily combined with #798 This Draft PR is just an FYI

fjetter · 2024-02-01T16:32:55Z

dask_expr/_core.py

+    def __hash__(self):
+        raise TypeError("Don't!")


"Funny" story: The above implementation is maintaining two dicts. One that maps names and dependencies and one that just maps names to the actual objects. Initially I just tried to use the Expr objects themselves in the mappings using a mapping from Expr -> set[Expr}. That triggered funny recursion errors and it took me a while to understand that...

Whenever there is a hash collision, the implementation of set falls back to use __eq__ to compare the new object with the one that is stored under a given hash, i.e. object.__eq__(old, new). However, this doesn't evaluate to a bool but defines another expression... 💥

Yeah, definite __eq__ is pretty hazardous in Python. It would be reasonable, I think, to drop Expr.__eq__ and use explicit Eq(a, b) calls instead, and then rely on Frame.__eq__ for user comfort.

Or rather, use whatever class is defined in collections.py to hold the __eq__ method

dask_expr/_core.py

phofl · 2024-02-01T16:58:59Z

I have to take a closer look, but I like the solution. When I tried the same I only stored the dependents on the expression which caused me trouble while recreating the mapping. Simply building the mapping continuously and storing it on the expression should circumvent the issues I ran into.

Thanks for taking a stab at this

…krefs # Conflicts: # dask_expr/_core.py # dask_expr/_expr.py # dask_expr/io/_delayed.py

phofl · 2024-02-14T18:06:51Z

dask_expr/tests/test_shuffle.py

@@ -640,7 +641,7 @@ def test_set_index_sort_values_one_partition(pdf):

 def test_set_index_triggers_calc_when_accessing_divisions(pdf, df):
    divisions_lru.data = OrderedDict()
-    query = df.set_index("x")
+    query = df.fillna(random.randint(1, 100)).set_index("x")


This one is funny, the previous expression was cached, so we were never calculating divisions with this PR. Using a random number here avoid this

phofl · 2024-02-14T22:43:12Z

Fix for failing tests is here: #880

Will merge into this branch

fjetter added 2 commits February 1, 2024 17:22

Track dependents without weakrefs

7700b82

delete the old thing

3531583

fjetter commented Feb 1, 2024

View reviewed changes

phofl reviewed Feb 1, 2024

View reviewed changes

dask_expr/_core.py Outdated Show resolved Hide resolved

fjetter mentioned this pull request Feb 1, 2024

Expr as singleton #798

Merged

phofl added 8 commits February 14, 2024 11:53

Update dask_expr/_core.py

af85220

Merge remote-tracking branch 'upstream/main' into dependents_wout_wea…

caf94ce

…krefs # Conflicts: # dask_expr/_core.py # dask_expr/_expr.py # dask_expr/io/_delayed.py

Update

a9f9904

Update

3155405

Merge branch 'main' into dependents_wout_weakrefs

cf5d77e

Don't run concurrently

39b3a26

Don't run concurrently

431c302

Update

1faa475

phofl reviewed Feb 14, 2024

View reviewed changes

Fix reads from local dir that changes directory

bda2577

phofl added 2 commits February 14, 2024 23:43

Merge branch 'chdir' into dependents_wout_weakrefs

babb1ee

Update to keep track of refs

97029db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM/RFC] Track dependents without weakrefs #831

[DNM/RFC] Track dependents without weakrefs #831

fjetter commented Feb 1, 2024

fjetter Feb 1, 2024

mrocklin Feb 1, 2024

mrocklin Feb 1, 2024

phofl commented Feb 1, 2024

phofl Feb 14, 2024 •

edited

Loading

phofl commented Feb 14, 2024

[DNM/RFC] Track dependents without weakrefs #831

Are you sure you want to change the base?

[DNM/RFC] Track dependents without weakrefs #831

Conversation

fjetter commented Feb 1, 2024

fjetter Feb 1, 2024

Choose a reason for hiding this comment

mrocklin Feb 1, 2024

Choose a reason for hiding this comment

mrocklin Feb 1, 2024

Choose a reason for hiding this comment

phofl commented Feb 1, 2024

phofl Feb 14, 2024 • edited Loading

Choose a reason for hiding this comment

phofl commented Feb 14, 2024

phofl Feb 14, 2024 •

edited

Loading