Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ranger.Envelope.Merge: ensure uniform offset distribution #269

Open
mattnibs opened this issue Aug 13, 2020 · 0 comments
Open

ranger.Envelope.Merge: ensure uniform offset distribution #269

mattnibs opened this issue Aug 13, 2020 · 0 comments

Comments

@mattnibs
Copy link
Collaborator

The solution to brimdata/super#1039 introduces a curious behavior for generated pcap indexes: For the indexes of large pcap files the difference between adjacent X values starts out very wide then narrows as one iterate through the Bins. This will result in larger pcap scans (i.e. slow searches) for hits at the beginning of the file and smaller scans (i.e. faster searches) towards the end. Consensus was that the difference in search times probably won't be noticeable enough to warrant introducing a fancier algorithm.

This ticket is to revisit the change to ranger.Envelope and find a solution that generates merged Envelopes with more uniform distance between adjacent offsets.

@mattnibs mattnibs transferred this issue from brimdata/zui Aug 13, 2020
mattnibs referenced this issue in brimdata/super Aug 13, 2020
Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed #1095 to
revisit.

Closes #1039
mattnibs referenced this issue in brimdata/super Aug 13, 2020
Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed #1095 to
revisit.

Closes #1039
mattnibs referenced this issue in brimdata/super Aug 13, 2020
Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed #1095 to
revisit.

Closes #1039
mattnibs referenced this issue in brimdata/super Aug 14, 2020
Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed #1095 to
revisit.

Closes #1039
brim-bot referenced this issue in brimdata/zui Aug 14, 2020
…" by mattnibs

This is an auto-generated commit with a zq dependency update. The zq PR
brimdata/super#1096, authored by @mattnibs,
has been merged.

pcap index: Compress offsets that exceed threshold

Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed brimsec/zq#1095 to
revisit.

Closes brimdata/super#1039
alfred-landrum referenced this issue in brimdata/super Aug 17, 2020
Introduce ranger.Envelope.Merge that merges two Envelopes into a
single Envelope.

This fixes bug where indexing a large pcap causes the system to
oom panic.

When constructing the time index for a pcap, compress the array of
offset points to an Envelope when the size of the array reaches a
certain threshold. Subsequent compressions will be merged into the
section's Envelope keeping the memory footprint low.

The downside to this approach is for the indexes of large pcap files
the difference between adjacent X values starts out very wide then
narrows as one iterate through the Bins. This will result in larger
pcap scans (i.e. slow searches) for hits at the beginning of the file
and smaller scans (i.e. faster searches) towards the end. Consensus
was that the difference in search times probably won't be noticeable
enough to warrant introducing a fancier algorithm. Filed #1095 to
revisit.

Closes #1039
@philrz philrz transferred this issue from brimdata/super Aug 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant