Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created new window pipe for sliding window aggregations #3

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

itsnotapt
Copy link
Contributor

The concept of pipes doesn't work very well in context of streaming events. In order to allow pipes to function as expected in the streaming scenario, I've added a new "window" pipe that will buffer events into a sliding window, allowing all pipes to function as expected even when streaming.

A typical example of this feature would be the following:

    process where process_name in ("whoami.exe", "netstat.exe", "hostname.exe", "net.exe", "sc.exe", "systeminfo.exe")
    | window 5m
    | unique hostname, process_name
    | unique_count process_name
    | filter count >= 3

In this example, the query is effective in both streaming and one off batches.

itsnotapt added 3 commits July 6, 2019 13:19
Pipes now reset their state after PIPE_EOF.
Added new window pipe for time windowing streamed events.
Fixed bug in walk__time_range reporting str instead of node.
… results.

Added documentation for window pipe.
@itsnotapt
Copy link
Contributor Author

@rw-access I removed the tight coupling with host_key associated with pipes, as this seems very environment specific and breaks some common scenarios.
The functionality is still available with pipes anyway, e.g. unique hostname, process_name | count hostname.

I had to "reset" state after the pipe receives PIPE_EOF, otherwise the pipe will retain counts, unique sets, etc from the previous emitted window buffer.

I don't have your test data, so I wasn't able to do a full test. My biggest concern is cases where I missed an object that should be immutable e.g. 37ee552.

@itsnotapt
Copy link
Contributor Author

Other things to consider, the analytic will technically hold onto the buffer longer than necessary, e.g. imagine we get a burst of events in a timespan, then no events. The buffer will retain this spike until another event in the future triggers the buffer cleanup.

I thought about adding a garbage cleanup routine, but I feel it's not a significant issue given it's unlikely to cause a relevant memory issue.

Rolled back host_key removal.
#	docs/query-guide/pipes.rst
#	eql/ast.py
#	eql/engine.py
#	eql/parser.py
#	setup.cfg
#	tests/test_data.json
#	tests/test_eql.py
#	tests/test_python_engine.py
@itsnotapt
Copy link
Contributor Author

@rw-access I've updated this code for EQL 0.7.0 and rolled back the host_key changes to make it easier for you to review.

I made a change to the head pipe for the test case of test_pipes_reset_state to work correctly. I couldn't see any code that suggested that head would exit early, so I believe the performance is unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant