SQL-style null comparison is not parsed as a binary expression in a search #5545

philrz · 2024-12-21T21:34:25Z

tl;dr

After the changes in #5490, comparisons formerly expressed in the Zed language like san.ip==null are now expressed SQL style like san is null. However, I noticed that if it appears in a search rather than a where, the comparison using the SQL approach isn't treated as a binary expression, and this had some unexpected side effects for me.

Details

Repro is with super commit 95e746d, which is associated with the changes in #5490.

This repro is a simplification of an Autoperf query. As the original query was more complex and the data queried consisted of millions of large records across tens-of-GB of input data, as a user it was much more difficult to figure out what was going on than shown in the repro below.

Take this sample data.json:

{
    "_path": "x509",
    "id": "FnUlL92kZIhBUyfHZb",
    "f": "This field",
    "san": {
        "ip": null
    }
}
{
    "_path": "x509",
    "id": "FnUlL92kZIhBUyfHZb",
    "f": "This is also field",
    "san": {
        "ip": "foo"
    }
}

With the last GA Zed v1.18.0, isolating the record with the null value for san.ip looked like the following:

$ zq -version
Version: v1.18.0

$ zq '_path=="x509" san.ip==null' data.json 
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This field",san:{ip:null}}

Though of course I could have also been explicit and done either of these to get the same result:

$ zq 'where _path=="x509" and san.ip==null' data.json 
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This field",san:{ip:null}}

$ zq 'search _path=="x509" and san.ip==null' data.json 
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This field",san:{ip:null}}

That is, like I expect is true for many of our users, I've traditionally been lazy about noticing when implied operators turn my "searches" into where or search, since I've always been able to start with a binary expression like the one shown above (which turns into a where), tack on something like a regexp (which will turn it into a search), and the parser makes the necessary adjustments for me so it works as expected.

$ zq '/also/ or (_path=="x509" and san.ip==null)' data.json 
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This field",san:{ip:null}}
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This is also field",san:{ip:"foo"}}

In the SuperDB era at commit 95e746d (which is associated with the changes in #5490), If we start from the same query, it no longer parses.

$ super -version
Version: v1.18.0-161-g95e746d3

$ super -c '_path=="x509" san.ip==null' data.json
parse error at line 1, column 15:
_path=="x509" san.ip==null
          === ^ ===

There's two things in particular that may need to be adjusted:

The changes in Bring back the "|" symbol and eliminate search rakes #5436 would require the use of ? to express this as a search
The changes in sam: Use SQL semantics for null comparison #5490 mean the null comparison is now expressed via is rather than ==

If we do just the first change, we get no output at all. FWIW, SQL still allows for the ==null comparison but it just doesn't behave the same as it did in Zed (blah!).

$ super -c '? _path=="x509" san.ip==null' data.json
[no output]

And if we do just the second change, it still doesn't parse.

$ super -c '_path=="x509" san.ip is null' data.json
parse error at line 1, column 15:
_path=="x509" san.ip is null
          === ^ ===

Therefore, acting as a fully educated user (i.e., let's assume I read some release notes for the Zed->SuperDB transition), I do both changes together. But the query result is not the same as in the Zed era.

$ super -c '? _path=="x509" san.ip is null' data.json
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This field",san:{ip:null}}
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This is also field",san:{ip:"foo"}}

What seems to be happening is that these are now being treated as separate search terms, e.g.,

The san.ip matches against the presence of the field key san.ip.
The is matches against strings like This and is in the contents of field f.
There's a case-insensitive string match on the nUlL in the id field.

$ super compile -Z '? _path=="x509" san.ip is null'
[
    {
        kind: "Search",
        expr: {
            kind: "BinaryExpr",
            op: "and",
            lhs: {
                kind: "BinaryExpr",
                op: "and",
                lhs: {
                    kind: "BinaryExpr",
                    op: "and",
                    lhs: {
                        kind: "BinaryExpr",
                        op: "==",
                        lhs: {
                            kind: "ID",
                            name: "_path",
                            loc: {
                                first: 2,
                                last: 6
                            }
                        },
                        rhs: {
                            kind: "Primitive",
                            type: "string",
                            text: "x509",
                            text_pos: 0,
                            loc: {
                                first: 9,
                                last: 14
                            }
                        },
                        loc: {
                            first: 2,
                            last: 14
                        }
                    },
                    rhs: {
                        kind: "Term",
                        text: "san.ip",
                        value: {
                            kind: "Primitive",
                            type: "string",
                            text: "san.ip",
                            text_pos: 0,
                            loc: {
                                first: 16,
                                last: 21
                            }
                        },
                        loc: {
                            first: 16,
                            last: 21
                        }
                    },
                    loc: {
                        first: 2,
                        last: 29
                    }
                },
                rhs: {
                    kind: "Term",
                    text: "is",
                    value: {
                        kind: "Primitive",
                        type: "string",
                        text: "is",
                        text_pos: 0,
                        loc: {
                            first: 23,
                            last: 24
                        }
                    },
                    loc: {
                        first: 23,
                        last: 24
                    }
                },
                loc: {
                    first: 2,
                    last: 29
                }
            },
            rhs: {
                kind: "Term",
                text: "null",
                value: {
                    kind: "Primitive",
                    type: "null",
                    text: "",
                    text_pos: 0,
                    loc: {
                        first: 26,
                        last: 29
                    }
                },
                loc: {
                    first: 26,
                    last: 29
                }
            },
            loc: {
                first: 2,
                last: 29
            }
        },
        loc: {
            first: 0,
            last: 29
        }
    }
]

If I was less lazy and hip to the fact that this all could be expressed as a where, I'd get my previous result (though I also had to now add the explicit and, per docs).

$ super -c 'where _path=="x509" and san.ip is null' data.json 
{_path:"x509",id:"FnUlL92kZIhBUyfHZb",f:"This field",san:{ip:null}}

Overall, however, my guess is that we probably intended the is null (or is not null) to still be treated as a binary expression even in these searches and hence avoid the surprise here. I think this would allow us to give guidance to users to just add ? to the start of implied-searches-that-worked-in-Zed and still get the same expected results.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL-style null comparison is not parsed as a binary expression in a search #5545

SQL-style null comparison is not parsed as a binary expression in a search #5545

philrz commented Dec 21, 2024

SQL-style null comparison is not parsed as a binary expression in a search #5545

SQL-style null comparison is not parsed as a binary expression in a search #5545

Comments

philrz commented Dec 21, 2024

tl;dr

Details