-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zed lakes should handle partition keys other than "ts" #2482
Comments
This is an auto-generated commit with a Zed dependency update. The Zed PR brimdata/super#2729, authored by @mccanne, has been merged. add support for arbitrary pool keys The backend previously presumed the pool key was of type time. This commit generalizes the scanning logic to allow pool keys of any type. The seek index has been simplified to use a simple, flat index that is no longer a "micro index". Instead it is a simple, single-level list of keys that is used at open to get the scan range for a row object when the scan is smaller than the whole object. This design presumes the index will be cached when running multiple queries over a sub-range within a row object. While updating tests, we noticed the segment size and row_size fields were virtually the same so we deleted the size field. As part of this commit, the index package was updated to use zson format for key parsing instead of the deprecated tzng format, so we removed zio/tzngio/builder.go. Closes brimdata/super#2482
This is an auto-generated commit with a Zed dependency update. The Zed PR brimdata/super#2729, authored by @mccanne, has been merged. add support for arbitrary pool keys The backend previously presumed the pool key was of type time. This commit generalizes the scanning logic to allow pool keys of any type. The seek index has been simplified to use a simple, flat index that is no longer a "micro index". Instead it is a simple, single-level list of keys that is used at open to get the scan range for a row object when the scan is smaller than the whole object. This design presumes the index will be cached when running multiple queries over a sub-range within a row object. While updating tests, we noticed the segment size and row_size fields were virtually the same so we deleted the size field. As part of this commit, the index package was updated to use zson format for key parsing instead of the deprecated tzng format, so we removed zio/tzngio/builder.go. Closes brimdata/super#2482
Verified using Brim commit For the security-centric use cases for which Brim/Zed have been largely used to date, one of the big benefits of pools with non-timestamp keys is being able to "join" primary log data (such as Zeek/Suricata) with additional data sources (such as threat intel or domain info), where the latter is not time-based. Being able to join in Zed is predicated on the two sources being both sorted by the join key, so being able to store these additional data sources pre-sorted by the join-ready key brings significant efficiency gains. This verification example leverages JA3 data for gleaning information about encrypted SSL/TLS sessions. In addition to the generation of the JA3 hashes using Zeek (which the Zeek embedded with the Brim app does out-of-the-box), here we also bring in the additional "List of all user-agents" data source that can be downloaded from https://ja3er.com/getAllUasJson. This gives information on user agents associated with each JA3 MD5 hash. So since the Zeek Before focusing on the user agent data, I've launched a Dev run of Brim commit While Brim is already using the Zed Lake for storage, it's not yet natively+fully able to handle these pools with non-
It turns out there's a little prep of the user agent data that has to happen. This is unrelated to the specific topic of non-timestamp partition keys, but is an opportunity to show off some other Zed features or remind us of some items on the Zed to-do list. Specifically:
Having said all that, here's the creation of the Pool followed by the preprocessing of the user agent data source and its
Note that we never did any explicit Now we'll create a new Pool to hold our decorated
(Note that #2765 tracks our intent to introduce a Because Brim still needs some additional enhancements in order to see these Pools that have been loaded from outside the app, I had to select Window > Reset State to get it to see the new Pool I just created. But once it's visible, I could "scroll right" and see this user agent data now at the tail end of each of my Thanks @mccanne! |
In the Zeek-centric era, Zed lakes were assumed to always be keyed by a field of type
time
calledts
. For the future where all generic data will be welcomed, lakes need the ability to handle keys of other data types and named fields.(@mccanne recently clarified for me that while the Zed commands currently give the impression that this is already possible [e.g.
zed lake create -p logs -k otherkey -order asc
responds as if it succeeded], but the lake itself still is wired to expect the key to bets
and sort by it. So the alternate-keyed data in this example would get treated as if every record hadts=0
and so it's effectively unsorted when you try to query the lake for it. So we definitely have work left to to to scale it, provide the same zoom-in/zoom-out experience we've traditionally had with time, etc.)The text was updated successfully, but these errors were encountered: