-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pcap index: Compress offsets that exceed threshold #1096
Conversation
e8b7e26
to
1f977dd
Compare
Integration test for brimdata/super#1096
Introduce ranger.Envelope.Merge that merges two Envelopes into a single Envelope. This fixes bug where indexing a large pcap causes the system to oom panic. When constructing the time index for a pcap, compress the array of offset points to an Envelope when the size of the array reaches a certain threshold. Subsequent compressions will be merged into the section's Envelope keeping the memory footprint low. The downside to this approach is for the indexes of large pcap files the difference between adjacent X values starts out very wide then narrows as one iterate through the Bins. This will result in larger pcap scans (i.e. slow searches) for hits at the beginning of the file and smaller scans (i.e. faster searches) towards the end. Consensus was that the difference in search times probably won't be noticeable enough to warrant introducing a fancier algorithm. Filed #1095 to revisit. Closes #1039
1f977dd
to
3623895
Compare
Integration test for brimdata/super#1096
I tested this out using Brim commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. As we discussed, we can make the algorithm better down the road if/when warranted.
…" by mattnibs This is an auto-generated commit with a zq dependency update. The zq PR brimdata/super#1096, authored by @mattnibs, has been merged. pcap index: Compress offsets that exceed threshold Introduce ranger.Envelope.Merge that merges two Envelopes into a single Envelope. This fixes bug where indexing a large pcap causes the system to oom panic. When constructing the time index for a pcap, compress the array of offset points to an Envelope when the size of the array reaches a certain threshold. Subsequent compressions will be merged into the section's Envelope keeping the memory footprint low. The downside to this approach is for the indexes of large pcap files the difference between adjacent X values starts out very wide then narrows as one iterate through the Bins. This will result in larger pcap scans (i.e. slow searches) for hits at the beginning of the file and smaller scans (i.e. faster searches) towards the end. Consensus was that the difference in search times probably won't be noticeable enough to warrant introducing a fancier algorithm. Filed brimsec/zq#1095 to revisit. Closes brimdata/super#1039
Introduce ranger.Envelope.Merge that merges two Envelopes into a single Envelope. This fixes bug where indexing a large pcap causes the system to oom panic. When constructing the time index for a pcap, compress the array of offset points to an Envelope when the size of the array reaches a certain threshold. Subsequent compressions will be merged into the section's Envelope keeping the memory footprint low. The downside to this approach is for the indexes of large pcap files the difference between adjacent X values starts out very wide then narrows as one iterate through the Bins. This will result in larger pcap scans (i.e. slow searches) for hits at the beginning of the file and smaller scans (i.e. faster searches) towards the end. Consensus was that the difference in search times probably won't be noticeable enough to warrant introducing a fancier algorithm. Filed #1095 to revisit. Closes #1039
Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.
This fixes bug where indexing a large pcap causes the system to
oom panic.
When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.
The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed brimdata/brimcap#269 to
revisit.
Closes #1039