Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zed query" not returning results in key order for "non-ts" Pool #2749

Closed
philrz opened this issue May 20, 2021 · 1 comment · Fixed by #2752
Closed

"zed query" not returning results in key order for "non-ts" Pool #2749

philrz opened this issue May 20, 2021 · 1 comment · Fixed by #2752
Assignees
Labels
bug Something isn't working

Comments

@philrz
Copy link
Contributor

philrz commented May 20, 2021

Repro is with Zed commit b13ac39 and the Zeek zed-sample-data.

While trying to verify the changes linked to #2482, my experience regarding the "ordered" pool key did not match my expectations as a user. Here I create a Pool keyed off the ip-typed id.resp_h field rather than the time-typed ts field that we're more accustomed to.

$ zed -version
Version: v0.29.0-346-gb13ac397

$ rm -rf $ZED_LAKE_ROOT

$ zed lake init
using environment variable ZED_LAKE_ROOT
lake created: file:///Users/phil/logs

$ zed lake create -S 5MB -p logs -orderby id.resp_h:asc
pool created: logs

$ zed lake load -p logs ~/work/zed-sample-data/zeek-default/conn.log.gz
1soa9C55d0CYXuU9XY7ENjJ9CSt committed 17 segments

Given that my zed lake create set an expectation that my Pool was ordered by id.resp_h in ascending order, I was expecting that a straight zed lake query with no explicit sort would show me the data in that ascending order. Indeed, looking at the initial output, it gives me that impression, but it eventually goes "back" to lower values again.

$ zed lake query -z 'from logs | cut id.resp_h'
{id:{resp_h:8.26.197.27}}
{id:{resp_h:8.254.92.155}}
{id:{resp_h:10.0.0.1}}
...
{id:{resp_h:216.239.36.10}}
{id:{resp_h:216.239.38.10}}
{id:{resp_h:216.239.38.10}}
{id:{resp_h:10.0.0.1}}
{id:{resp_h:10.0.0.1}}
...

Indeed, if I ask the Pool to be explicitly sorted, we see a whole bunch of "lower" addresses returned first.

$ zed lake query -z 'from logs | cut id.resp_h | sort'
{id:{resp_h:2.22.230.64}}
{id:{resp_h:2.22.230.64}}
{id:{resp_h:2.22.230.64}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:5.9.78.71}}
{id:{resp_h:5.9.102.20}}
...

However, this effect does not appear when the Pool is keyed off ts as we're accustomed to.

$ rm -rf $ZED_LAKE_ROOT

$ zed lake init
using environment variable ZED_LAKE_ROOT
lake created: file:///Users/phil/logs

$ zed lake create -S 5MB -p logs -orderby ts:desc
pool created: logs

$ zed lake load -p logs ~/work/zed-sample-data/zeek-default/conn.log.gz
1sob9yKewJ7kpCM60hHZi4FzEgt committed 17 segments

$ zed lake query -z 'from logs | cut ts' > /tmp/ts-query
$ zed lake query -z 'from logs | cut ts | sort -r' > /tmp/ts-query-sorted

$ ls -l /tmp/ts*
-rw-r--r--  1 phil  wheel  33610593 May 20 13:15 /tmp/ts-query
-rw-r--r--  1 phil  wheel  33610593 May 20 13:15 /tmp/ts-query-sorted

$ diff /tmp/ts-query /tmp/ts-query-sorted 
$ echo $?
0
@philrz philrz added the bug Something isn't working label May 20, 2021
@mccanne mccanne self-assigned this May 20, 2021
brim-bot pushed a commit to brimdata/brimcap that referenced this issue May 21, 2021
…key" by mccanne

This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#2752, authored by @mccanne,
has been merged.

fix bug in lake scan when merging by non-ts pool key

closes brimdata/super#2749
brim-bot pushed a commit to brimdata/brimcap that referenced this issue May 21, 2021
…key" by mccanne

This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#2752, authored by @mccanne,
has been merged.

fix bug in lake scan when merging by non-ts pool key

closes brimdata/super#2749
brim-bot pushed a commit to brimdata/zui that referenced this issue May 21, 2021
…key" by mccanne

This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#2752, authored by @mccanne,
has been merged.

fix bug in lake scan when merging by non-ts pool key

closes brimdata/super#2749
@philrz
Copy link
Contributor Author

philrz commented May 21, 2021

Verified in Zed commit 0440b15.

Now when I create the Pool with the ascending-order ip-type key, I get the same "lower" addresses up front that I saw when I explicitly sorted them.

$ zed -version
Version: v0.29.0-356-g0440b15a

$ rm -rf $ZED_LAKE_ROOT

$ zed lake init
using environment variable ZED_LAKE_ROOT
lake created: file:///Users/phil/logs

$ zed lake create -S 5MB -p logs -orderby id.resp_h:asc
pool created: logs

$ zed lake load -p logs ~/work/zed-sample-data/zeek-default/conn.log.gz
1sr8QJiUdXdH76XppFhXSkSpZXL committed 17 segments

$ zed lake query -z 'from logs | cut id.resp_h'
{id:{resp_h:2.22.230.64}}
{id:{resp_h:2.22.230.64}}
{id:{resp_h:2.22.230.64}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:4.53.160.75}}
{id:{resp_h:5.9.78.71}}
{id:{resp_h:5.9.102.20}}
...

Indeed, a diff of this output against the explicitly-sorted one shows them as being the same, which is what we'd expect.

$ zed lake query -z 'from logs | cut id.resp_h' > /tmp/query-ip
$ zed lake query -z 'from logs | cut id.resp_h | sort' > /tmp/query-ip-sorted

$ ls -l /tmp/query-ip*
-rw-r--r--  1 phil  wheel  26633960 May 21 10:51 /tmp/query-ip
-rw-r--r--  1 phil  wheel  26633960 May 21 10:51 /tmp/query-ip-sorted

$ diff /tmp/query-ip /tmp/query-ip-sorted 
$ echo $?
0

Thanks @mccanne!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants