Filtering doesn't work as expected #40

xhh2a · 2015-09-15T20:54:30Z

I guess I will look into this sometime

>>> import dpath.util
>>> a = { 'actions' : [ { 'type': 'correct' }, { 'type': 'incorrect' } ] }
>>> dpath.util.values(a, 'actions/*')
[{'type': 'correct'}, {'type': 'incorrect'}]
>>> dpath.util.values(a, 'actions/*',afilter=lambda x: x.get('type', None) is 'correct')
[]
>>> def f(x):
>>>     print x
>>> dpath.util.values(a, 'actions/*',afilter=f)
[]

akesterson · 2015-09-16T04:25:46Z

what dpath version are you using?

akesterson · 2015-09-16T05:18:13Z

Nevermind, I can replicate this behavior on latest master.

akesterson · 2015-09-16T05:28:29Z

This has been broken for a long time, ever since values() was introduced, possibly even before. Really sorry about this one. I may not get around to a fix for this one tonight, but it's high on my list.

xhh2a · 2015-09-16T14:44:27Z

From a quick glance at the logic in the code yesterday, I think it is due to the way the code does in place filtering while it does the search so it fails on the top level branch of the dict before it hits a result where the filter would pass it. I was expecting a post dpath filter, not necessarily a bug. Although it didn't print anything so it could be something else.

Also I have a bug in the lambda should be .get() == 'correct', will test once I get into the office

xhh2a · 2015-09-16T15:58:51Z

So the typo doesn't change anything

akesterson · 2015-09-19T05:41:03Z

TL;DR - Filtering doesn't work the way any of us assumes it does!

So, after sleeping on this and pulling my head out of my buttcheeks, this boils down to a lack of good documentation. In fact, the lack of good docs is so bad that even I misunderstood what's actually going on behind the scenes, and how these functions work.

First, you're correct, filters are pre, not post. They are executed as each level of the source dictionary is parsed. Second, dpath treats dictionaries and lists as trees. Filters execute on the leaf nodes, e.g., the individual dictionary values; they do not execute on dictionaries or lists, which are branches on the tree, not leaf nodes. Consider:

>>> import dpath.util                                                                                                                                                                                                          
>>> a = { 'actions' : [ { 'type': 'correct' }, { 'type': 'incorrect' } ] }                                                                                                                                                     
>>> def f(x):                                                                                                                                                                                                                  
...     print x                                                                                                                                                                                                                
...                                                                                                                                                                                                                            
>>> dpath.util.values(a, 'actions/*',afilter=f)                                                                                                                                                                                
[]                                                                                                                                                                                                                             
>>> dpath.util.values(a, '**',afilter=f)                                                                                                                                                                                       
correct                                                                                                                                                                                                                        
incorrect                                                                                                                                                                                                                      
[]

The first call to values failed (and printed nothing) because actions/* matches only one level deep (not multi-level), and the only thing there is 2 dictionary objects, neither of which are leaf nodes, so the filter isn't executed on them, hence nothing gets printed (and not returned).

This behavior is not exclusive to values() either, the filtering happens in search, so search has the same issue:

>>> dpath.util.search(a, 'actions/*', afilter=f)
{}

.. If we expand the search range to "actions/**", we will reach the leaf nodes we want to filter against, and our print works:

>>> dpath.util.values(a, '**',afilter=f)
correct
incorrect
[]

... if we change our filter to actually give us some results now (with a .get() lambda like you had), we see now that this is obviously going to fail, because we're calling .get() on a string:

>>> dpath.util.values(a, 'actions/**', afilter=(lambda x: x.get('type', None) == 'correct'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "dpath/util.py", line 117, in values
    return [x[1] for x in dpath.util.search(obj, glob, yielded=True, separator=separator, afilter=afilter, dirs=dirs)]
  File "dpath/util.py", line 144, in _search_yielded
    val = dpath.path.get(obj, path, view=False, afilter=afilter)
  File "dpath/path.py", line 276, in get
    if (afilter and (not afilter(target))):
  File "<stdin>", line 1, in <lambda>
AttributeError: 'str' object has no attribute 'get'

... but it works great if we check against a simple string comparison, because we're getting the leaf nodes:

>>> dpath.util.values(a, 'actions/**', afilter=(lambda x: x == 'correct'))
['correct']
>>> dpath.util.search(a, 'actions/**', afilter=(lambda x: x == 'correct'))
{'actions': [{'type': 'correct'}]}

So, while filtering does work exactly as it was designed, this implementation clearly defies the rule of least astonishment. Even though the docs clearly state how it works:

... Filtering functions receive every terminus node in a search - e.g., anything that is not a dict or a list, at the very end of the path. For each value, they return True to include that value in the result set, or False to exclude it.

... I think it's worth considering changing how this works, or supporting some kind of alternate filtering mechanisms.

RFC, @calebcase, OP?

xhh2a · 2015-09-19T06:52:26Z

In terms of alternative post-search, I am able to handle it using other methods. Thus it's not that big of a deal if a pre filter is kept. Is it less efficient than doing it while stepping through the tree? Yes, but it's not something that should result in significant issues for me in my use case. Honestly I can't think of too many use cases where I would want a pre search, but I can see where others may want it.

I suppose the documentation could definitely use some clarification on this. Changing the filter type might cause backwards incompatibility for users though so a post-search might be better to be added as a separate arg param.

Incidentally I need to remember to test unicode/ascii with filter.

akesterson · 2015-10-11T15:12:04Z

What if we could say that all filter arguments must be subclasses of some base filter object, say dpath.Filter, and each Filter can implement one or more filtering steps, which would be called at various steps of the process? Then the user could just implement/override the steps they care about, and you could also make multi-step filters this way:

class MyFilter(dpath.Filter):
    def pre_leaf(self, leaf):
        """pre-filter a leaf node"""
        pass
    def post_leaf(self, leaf):
        """post-filter a leaf node"""
        pass
    def pre_branch(self, branch)
        """pre-filter a branch node (list or dictionary)"""
        pass
    def post_branch(self, branch):
        """post-filter a branch node (list or dictionary)"""
        pass

This would raise the question re: what to do with the existing filter functionality (being able to pass a lambda is awfully convenient), but I think this implementation would be significantly more clear.

akesterson · 2016-03-27T22:17:31Z

This seems like an excellent thing to fix in 2.0, re-labeling.

akesterson · 2020-07-27T15:15:49Z

Blocked by #136

akesterson added the bug label Sep 16, 2015

akesterson added question and removed bug labels Sep 19, 2015

akesterson mentioned this issue Oct 11, 2015

provide example of list of dictionaries and filtering by intermediate key #42

Closed

akesterson added the 1.0 label Mar 27, 2016

akesterson added 2.0 and removed 1.0 labels Mar 27, 2016

calebcase added 3.x enhancement and removed 2.0 question labels Dec 30, 2019

calebcase added enhancement and removed 3.x enhancement labels Jan 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filtering doesn't work as expected #40

Filtering doesn't work as expected #40

xhh2a commented Sep 15, 2015

akesterson commented Sep 16, 2015

akesterson commented Sep 16, 2015

akesterson commented Sep 16, 2015

xhh2a commented Sep 16, 2015

xhh2a commented Sep 16, 2015

akesterson commented Sep 19, 2015

xhh2a commented Sep 19, 2015

akesterson commented Oct 11, 2015

akesterson commented Mar 27, 2016

akesterson commented Jul 27, 2020

Filtering doesn't work as expected #40

Filtering doesn't work as expected #40

Comments

xhh2a commented Sep 15, 2015

akesterson commented Sep 16, 2015

akesterson commented Sep 16, 2015

akesterson commented Sep 16, 2015

xhh2a commented Sep 16, 2015

xhh2a commented Sep 16, 2015

akesterson commented Sep 19, 2015

xhh2a commented Sep 19, 2015

akesterson commented Oct 11, 2015

akesterson commented Mar 27, 2016

akesterson commented Jul 27, 2020