Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcap index: Compress offsets that exceed threshold #1096

Merged
merged 1 commit into from
Aug 14, 2020
Merged

Conversation

mattnibs
Copy link
Collaborator

Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed brimdata/brimcap#269 to
revisit.

Closes #1039

mattnibs added a commit to brimdata/zui that referenced this pull request Aug 13, 2020
@mattnibs
Copy link
Collaborator Author

Some stats for this:

branch: master (33GB pcap file)
$ gtime -p pcap index -r ~/brimdata/wrccdc18/pcaps/all.pcap > /dev/null
real 30.46
user 17.17
sys 16.54
master_mem

branch: pcap-index-oom (33GB pcap file)
$ gtime -p pcap index -r ~/brimdata/wrccdc18/pcaps/all.pcap > /dev/null
real 20.13
user 9.73
sys 9.30
oompcap_mem

Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed #1095 to
revisit.

Closes #1039
mattnibs added a commit to brimdata/zui that referenced this pull request Aug 13, 2020
@mattnibs mattnibs requested a review from a team August 13, 2020 19:21
@philrz
Copy link
Contributor

philrz commented Aug 14, 2020

I tested this out using Brim commit 13cac48 pointing at zqd commit 3623895 (this branch) and found I was able to successfully import the 35 GB pcap on the same 4 GB VM where I'd witnessed the OOM in the original repro in #1039. I watched the RES memory used by zqd and saw a similar low ceiling as @mattnibs showed in his chart above. 👍

Copy link
Collaborator

@mccanne mccanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. As we discussed, we can make the algorithm better down the road if/when warranted.

@mattnibs mattnibs merged commit c46c0b5 into master Aug 14, 2020
@mattnibs mattnibs deleted the pcap-index-oom branch August 14, 2020 20:16
brim-bot pushed a commit to brimdata/zui that referenced this pull request Aug 14, 2020
…" by mattnibs

This is an auto-generated commit with a zq dependency update. The zq PR
brimdata/super#1096, authored by @mattnibs,
has been merged.

pcap index: Compress offsets that exceed threshold

Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed brimsec/zq#1095 to
revisit.

Closes brimdata/super#1039
alfred-landrum pushed a commit that referenced this pull request Aug 17, 2020
Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed #1095 to
revisit.

Closes #1039
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOM during attempted import of 35 GB pcap
3 participants