Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering doesn't work as expected #40

Open
xhh2a opened this issue Sep 15, 2015 · 10 comments
Open

Filtering doesn't work as expected #40

xhh2a opened this issue Sep 15, 2015 · 10 comments

Comments

@xhh2a
Copy link
Contributor

xhh2a commented Sep 15, 2015

I guess I will look into this sometime

>>> import dpath.util
>>> a = { 'actions' : [ { 'type': 'correct' }, { 'type': 'incorrect' } ] }
>>> dpath.util.values(a, 'actions/*')
[{'type': 'correct'}, {'type': 'incorrect'}]
>>> dpath.util.values(a, 'actions/*',afilter=lambda x: x.get('type', None) is 'correct')
[]
>>> def f(x):
>>>     print x
>>> dpath.util.values(a, 'actions/*',afilter=f)
[]
@akesterson
Copy link
Collaborator

what dpath version are you using?

@akesterson
Copy link
Collaborator

Nevermind, I can replicate this behavior on latest master.

@akesterson akesterson added the bug label Sep 16, 2015
@akesterson
Copy link
Collaborator

This has been broken for a long time, ever since values() was introduced, possibly even before. Really sorry about this one. I may not get around to a fix for this one tonight, but it's high on my list.

@xhh2a
Copy link
Contributor Author

xhh2a commented Sep 16, 2015

From a quick glance at the logic in the code yesterday, I think it is due to the way the code does in place filtering while it does the search so it fails on the top level branch of the dict before it hits a result where the filter would pass it. I was expecting a post dpath filter, not necessarily a bug. Although it didn't print anything so it could be something else.

Also I have a bug in the lambda should be .get() == 'correct', will test once I get into the office

@xhh2a
Copy link
Contributor Author

xhh2a commented Sep 16, 2015

So the typo doesn't change anything

@akesterson
Copy link
Collaborator

TL;DR - Filtering doesn't work the way any of us assumes it does!

So, after sleeping on this and pulling my head out of my buttcheeks, this boils down to a lack of good documentation. In fact, the lack of good docs is so bad that even I misunderstood what's actually going on behind the scenes, and how these functions work.

First, you're correct, filters are pre, not post. They are executed as each level of the source dictionary is parsed. Second, dpath treats dictionaries and lists as trees. Filters execute on the leaf nodes, e.g., the individual dictionary values; they do not execute on dictionaries or lists, which are branches on the tree, not leaf nodes. Consider:

>>> import dpath.util                                                                                                                                                                                                          
>>> a = { 'actions' : [ { 'type': 'correct' }, { 'type': 'incorrect' } ] }                                                                                                                                                     
>>> def f(x):                                                                                                                                                                                                                  
...     print x                                                                                                                                                                                                                
...                                                                                                                                                                                                                            
>>> dpath.util.values(a, 'actions/*',afilter=f)                                                                                                                                                                                
[]                                                                                                                                                                                                                             
>>> dpath.util.values(a, '**',afilter=f)                                                                                                                                                                                       
correct                                                                                                                                                                                                                        
incorrect                                                                                                                                                                                                                      
[]                                                                                                                                                                                                                             

The first call to values failed (and printed nothing) because actions/* matches only one level deep (not multi-level), and the only thing there is 2 dictionary objects, neither of which are leaf nodes, so the filter isn't executed on them, hence nothing gets printed (and not returned).

This behavior is not exclusive to values() either, the filtering happens in search, so search has the same issue:

>>> dpath.util.search(a, 'actions/*', afilter=f)
{}

.. If we expand the search range to "actions/**", we will reach the leaf nodes we want to filter against, and our print works:

>>> dpath.util.values(a, '**',afilter=f)
correct
incorrect
[]

... if we change our filter to actually give us some results now (with a .get() lambda like you had), we see now that this is obviously going to fail, because we're calling .get() on a string:

>>> dpath.util.values(a, 'actions/**', afilter=(lambda x: x.get('type', None) == 'correct'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "dpath/util.py", line 117, in values
    return [x[1] for x in dpath.util.search(obj, glob, yielded=True, separator=separator, afilter=afilter, dirs=dirs)]
  File "dpath/util.py", line 144, in _search_yielded
    val = dpath.path.get(obj, path, view=False, afilter=afilter)
  File "dpath/path.py", line 276, in get
    if (afilter and (not afilter(target))):
  File "<stdin>", line 1, in <lambda>
AttributeError: 'str' object has no attribute 'get'

... but it works great if we check against a simple string comparison, because we're getting the leaf nodes:

>>> dpath.util.values(a, 'actions/**', afilter=(lambda x: x == 'correct'))
['correct']
>>> dpath.util.search(a, 'actions/**', afilter=(lambda x: x == 'correct'))
{'actions': [{'type': 'correct'}]}

So, while filtering does work exactly as it was designed, this implementation clearly defies the rule of least astonishment. Even though the docs clearly state how it works:

... Filtering functions receive every terminus node in a search - e.g., anything that is not a dict or a list, at the very end of the path. For each value, they return True to include that value in the result set, or False to exclude it.

... I think it's worth considering changing how this works, or supporting some kind of alternate filtering mechanisms.

RFC, @calebcase, OP?

@akesterson akesterson added question and removed bug labels Sep 19, 2015
@xhh2a
Copy link
Contributor Author

xhh2a commented Sep 19, 2015

In terms of alternative post-search, I am able to handle it using other methods. Thus it's not that big of a deal if a pre filter is kept. Is it less efficient than doing it while stepping through the tree? Yes, but it's not something that should result in significant issues for me in my use case. Honestly I can't think of too many use cases where I would want a pre search, but I can see where others may want it.

I suppose the documentation could definitely use some clarification on this. Changing the filter type might cause backwards incompatibility for users though so a post-search might be better to be added as a separate arg param.

Incidentally I need to remember to test unicode/ascii with filter.

@akesterson
Copy link
Collaborator

What if we could say that all filter arguments must be subclasses of some base filter object, say dpath.Filter, and each Filter can implement one or more filtering steps, which would be called at various steps of the process? Then the user could just implement/override the steps they care about, and you could also make multi-step filters this way:

class MyFilter(dpath.Filter):
    def pre_leaf(self, leaf):
        """pre-filter a leaf node"""
        pass
    def post_leaf(self, leaf):
        """post-filter a leaf node"""
        pass
    def pre_branch(self, branch)
        """pre-filter a branch node (list or dictionary)"""
        pass
    def post_branch(self, branch):
        """post-filter a branch node (list or dictionary)"""
        pass

This would raise the question re: what to do with the existing filter functionality (being able to pass a lambda is awfully convenient), but I think this implementation would be significantly more clear.

@akesterson
Copy link
Collaborator

This seems like an excellent thing to fix in 2.0, re-labeling.

@akesterson
Copy link
Collaborator

Blocked by #136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants