-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Early return FileMappedDict initialisation when in readonly mode. #5
base: gc_production_branch_do_not_push
Are you sure you want to change the base?
Changes from all commits
dd5cfd2
b0af27d
3de283f
50aadff
64637d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -198,11 +198,7 @@ def initialize(filename, readonly = false) | |
|
||
if @used > 0 | ||
# File already has data. Read the existing values | ||
with_file_lock do | ||
read_all_values.each do |key, _, pos| | ||
@positions[key] = pos | ||
end | ||
end | ||
with_file_lock { populate_positions } | ||
else | ||
# File is empty. Init the `used` counter, if we're in write mode | ||
if !readonly | ||
|
@@ -213,10 +209,14 @@ def initialize(filename, readonly = false) | |
end | ||
end | ||
|
||
# Yield (key, value, pos). No locking is performed. | ||
# Return a list of key-value pairs | ||
def all_values | ||
with_file_lock do | ||
read_all_values.map { |k, v, p| [k, v] } | ||
@positions.map do |key, pos| | ||
@f.seek(pos) | ||
value = @f.read(8).unpack('d')[0] | ||
[key, value] | ||
end | ||
end | ||
end | ||
|
||
|
@@ -292,22 +292,18 @@ def init_value(key) | |
@positions[key] = @used - 8 | ||
end | ||
|
||
# Yield (key, value, pos). No locking is performed. | ||
def read_all_values | ||
# Read position of all keys. No locking is performed. | ||
def populate_positions | ||
@f.seek(8) | ||
values = [] | ||
while @f.pos < @used | ||
padded_len = @f.read(4).unpack('l')[0] | ||
encoded = @f.read(padded_len).unpack("A#{padded_len}")[0] | ||
value = @f.read(8).unpack('d')[0] | ||
values << [encoded.strip, value, @f.pos - 8] | ||
key = @f.read(padded_len).unpack("A#{padded_len}")[0].strip | ||
@positions[key] = @f.pos | ||
@f.seek(8, :CUR) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ooooh, this is so much better Other than that, 👍 from me Btw, i'm assuming you've updated this repo to the latest master on the main repo, now that prometheus#95 is merged? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Nope, this PR is based on |
||
end | ||
values | ||
end | ||
end | ||
end | ||
end | ||
end | ||
end | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. However, it make usage of this object dangerous.
Basically, this means that if you init with readonly, the only method you can call is
all_values
If you call anything else, you'll corrupt the file.
This happens to be fine right now, because in the only case where we use readonly, that's the only method we calll. But i'm worried of not being aware of this if we (or anyone else) ever modify code down the line, this makes it quite brittle / dangerous...
Can we have some sort of
@file_map_loaded?
instance var that gets set byread_all_values
? Then, all the other methods would have to check it and callread_all_values
if that var is false.It's quite dirty, but it'd help with the memory allocation.
Alternatively, we can change
all_values
so that it doesn't callread_all_values
, and uses@positions
instead? It'll make its code more complex but accomplish what we're trying to do here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Was focused on the code path followed by exporting/formatting and didn't realise that other methods would break if
@positions
is not indexed at creation time.On that point, usage the
readonly
flag might be a bit confusing here in that it still allows write operations to go through, but that's outside the topic of this PR.So, I pushed a second commit that reverts my previous change (
@positions
is again indexed at creation time) and uses@positions
inall_values
to seek across all values. Things look good, but the peak memory reduction I got in the benchmark attached to this PR is not as much exciting as it was before:This is because, while not reading file content twice, we're still creating more short-living objects.
However, this second change leaves
read_all_values
used only at creation time, so it opens the doors to another optimisation: I pushed a 3rd commit that rewritesread_all_values
in order to yield key-position pairs and avoid reading metric values at creation time.This seems to bring numbers back to the original figures:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this new version!
I have one further proposal.
read_all_values
is now only called from one place... Instead of accepting a block and yielding to it for each value... You could just populate@positions
from insideread_all_values
and remove that indirection.(sorry for my super late review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmagliola pushed another commit with the changes you suggested. Would you mind having another pass?
CC @Sinjo