Refactor, fix, and optimizer filters/rules #1794

dsblank · 2024-11-22T19:26:31Z

This PR does three things:

Unrolls the recursive call of filter.apply(), and splits out single checks into filter.apply_to_one()
Uses JSON data where possible, passing the data rather than objects. If an object is needed, it will not be recreated more than once.
Adds an optimizer based on rule.map sets

stevenyoungs · 2024-11-30T11:41:17Z

gramps/gen/filters/rules/repository/_hasrepo.py

        """
        Apply the rule. Return True on a match.
        """
+        obj = self.get_object(data)
+        # FIXME: probably can do the rest without object:
        if not self.match_substring(0, obj.get_name()):


Suggested change

if not self.match_substring(0, obj.get_name()):

if not self.match_substring(0, data["name"]):

stevenyoungs · 2024-11-30T11:46:44Z

gramps/gen/filters/rules/person/_nobirthdate.py

-    def apply(self, db, person):
+    def apply_to_one(self, db, data):
+        person = self.get_object(data)
+        # FIXME: can probably do all of this without object:


use data["birth_ref_index"] to avoid constructing the object?

stevenyoungs · 2024-11-30T11:48:25Z

gramps/gen/filters/rules/place/_inlatlonneighborhood.py

@@ -145,6 +145,8 @@ def apply(self, db, place):

        # now we know at least one is given in the filter and is valid

+        place = self.get_object(data)


Can you use data["lat"] and data["long"] to avoid constructing the object?

dsblank · 2024-11-30T12:39:32Z

Thanks @stevenyoungs for looking this over and the feedback. I'm hoping that @Nick-Hall is open to these three ideas (fixes, using JSON data, and the optimizer) because all three make the filter system so much faster while keeping the API almost the same. Many places in Gramps use the raw data for speed, including tools, views, displayers, and importers. It would be a shame if filters couldn't do the same.

stevenyoungs · 2024-12-01T20:53:07Z

Thanks @stevenyoungs for looking this over and the feedback. I'm hoping that @Nick-Hall is open to these three ideas (fixes, using JSON data, and the optimizer) because all three make the filter system so much faster while keeping the API almost the same. Many places in Gramps use the raw data for speed, including tools, views, displayers, and importers. It would be a shame if filters couldn't do the same.

No problem. I'm also keen to see the benefits of more standard storage and see what opportunities it unlocks. The recent discussion of GQL is an interesting option.
I've learnt a lot from following your changes and hopefully my comments have had a minor benefit to the overall quality of the PR.

Nick-Hall · 2024-12-01T23:22:12Z

hopefully my comments have had a minor benefit to the overall quality of the PR.

Your feedback is very useful. It always helps for another person to review the code. I very much appreciate your contributions.

Nick-Hall · 2024-12-01T23:50:47Z

I'm hoping that @Nick-Hall is open to these three ideas (fixes, using JSON data, and the optimizer) because all three make the filter system so much faster while keeping the API almost the same.

With the new JSON format, I think that we can regard the raw data as providing lightweight objects. The format mirrors the Gramps objects and is therefore much easier to understand and more resilient to updates.

I've been doing some research today to see if we can make the dict format more object-like. For example, data.gender would perhaps be neater than data["gender"]. The dataclasses introduced in python 3.7 look interesting, but I don't think that they are of much help to us. Using a JSON parser written in C may be an option in the future, but the ones I was looking at deserialized to dict structures.

I am open to using JSON data in more core code, including filters. We should probably discuss where the raw format is acceptable. Third-party code should be careful when using the raw format. It may change between feature releases, not just major releases, but his has always been the case.

An optimizer seems like a good idea. I haven't reviewed your code yet though.

dsblank · 2024-12-02T12:28:02Z

I've been doing some research today to see if we can make the dict format more object-like. For example, data.gender would perhaps be neater than data["gender"]

How about just wrapping our raw data with DataDict(data):

class DataDict(dict):
    def __getattr__(self, key):
        value = self[key]
        if isinstance(value, dict):
            return DataDict(value)
        elif isinstance(value, list):
            return DataList(value)
        else:
            return value
        
class DataList(list):
    def __getitem__(self, position):
        value = super().__getitem__(position)
        if isinstance(value, dict):
            return DataDict(value)
        elif isinstance(value, list):
            return DataList(value)
        else:
            return value

It allows exactly what you describe, is low-cost, and doesn't require any more code than that above.

I am open to using JSON data in more core code, including filters. We should probably discuss where the raw format is acceptable. Third-party code should be careful when using the raw format. It may change between feature releases, not just major releases, but his has always been the case.

Agreed. The rule has seemed to be: use where it is needed for speed or easy representation.

stevenyoungs · 2024-12-02T22:43:05Z

The only downside is that we leak the internal representation of the objects into the higher layers of code. But since those layers, in part, already use the raw format, this PR does not change make a material difference.

dsblank · 2024-12-03T12:58:40Z

The only downside is that we leak the internal representation of the objects into the higher layers of code.

For many attributes of an object, it isn't so much "internal" now that we are switching to JSON. For example, this is the Person object API to get the parent family list:

person.parent_family_list # object attribute
person.get_parent_family_handle_list() # function call

With the above DataDict (with some additional code) with new JSON, it could look like (where person is really JSON data with a wrapper):

person.parent_family_list  # JSON access
person.get_parent_family_handle_list() # function call, instantiates object

The new version would hide the fact that the function call creates the object (but just once). So because the JSON data mirrors the actual attribute names, it doesn't have to change at all.

Refactor, fix, and optimizer filters/rules

8be7a4a

dsblank self-assigned this Nov 22, 2024

dsblank added the enhancement label Nov 22, 2024

dsblank added 3 commits November 22, 2024 14:33

Update hastagbase to non-optimized version

9e85cee

Deal with non-existent filters; pause to show progress

f4a552a

Linting

ca50a40

stevenyoungs reviewed Nov 30, 2024

View reviewed changes

dsblank marked this pull request as ready for review December 3, 2024 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor, fix, and optimizer filters/rules #1794

Refactor, fix, and optimizer filters/rules #1794

dsblank commented Nov 22, 2024

stevenyoungs Nov 30, 2024

stevenyoungs Nov 30, 2024 •

edited

Loading

stevenyoungs Nov 30, 2024

dsblank commented Nov 30, 2024

stevenyoungs commented Dec 1, 2024

Nick-Hall commented Dec 1, 2024

Nick-Hall commented Dec 1, 2024

dsblank commented Dec 2, 2024

stevenyoungs commented Dec 2, 2024

dsblank commented Dec 3, 2024 •

edited

Loading

	if not self.match_substring(0, obj.get_name()):
	if not self.match_substring(0, data["name"]):

		@@ -145,6 +145,8 @@ def apply(self, db, place):

		# now we know at least one is given in the filter and is valid

		place = self.get_object(data)

Refactor, fix, and optimizer filters/rules #1794

Are you sure you want to change the base?

Refactor, fix, and optimizer filters/rules #1794

Conversation

dsblank commented Nov 22, 2024

stevenyoungs Nov 30, 2024

Choose a reason for hiding this comment

stevenyoungs Nov 30, 2024 • edited Loading

Choose a reason for hiding this comment

stevenyoungs Nov 30, 2024

Choose a reason for hiding this comment

dsblank commented Nov 30, 2024

stevenyoungs commented Dec 1, 2024

Nick-Hall commented Dec 1, 2024

Nick-Hall commented Dec 1, 2024

dsblank commented Dec 2, 2024

stevenyoungs commented Dec 2, 2024

dsblank commented Dec 3, 2024 • edited Loading

stevenyoungs Nov 30, 2024 •

edited

Loading

dsblank commented Dec 3, 2024 •

edited

Loading