-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heap addresses are not reliable to produce diffs #31
Comments
A related problem with this is that diffs do not survive heap compaction: require 'objspace'
BEFORE = 'before.json'
AFTER = 'after.json'
GC.start
ObjectSpace.dump_all(output: open(BEFORE, 'w'))
# GC.compact
ObjectSpace.dump_all(output: open(AFTER, 'w')) Semantically, there should be no heap difference between the two calls to $ruby ./heap.rb && heapy diff before.json after.json
Allocated STRING 5 objects of size 256/824 (in bytes) at: :
Allocated HASH 2 objects of size 336/824 (in bytes) at: :
Allocated FILE 1 objects of size 232/824 (in bytes) at: : When I uncomment the call to $ruby ./heap.rb && heapy diff before.json after.json
Allocated STRING 2153 objects of size 110441/657516 (in bytes) at: :
Allocated IMEMO 683 objects of size 461852/657516 (in bytes) at: :
Allocated ARRAY 232 objects of size 28784/657516 (in bytes) at: :
Allocated OBJECT 66 objects of size 7272/657516 (in bytes) at: :
Allocated CLASS 25 objects of size 16608/657516 (in bytes) at: :
Allocated HASH 24 objects of size 16736/657516 (in bytes) at: :
Allocated DATA 20 objects of size 1678/657516 (in bytes) at: :
Allocated REGEXP 19 objects of size 12769/657516 (in bytes) at: :
Allocated STRUCT 5 objects of size 200/657516 (in bytes) at: :
Allocated ICLASS 4 objects of size 160/657516 (in bytes) at: :
Allocated MODULE 2 objects of size 1016/657516 (in bytes) at: : This makes sense because many of these objects moved around on the heap and had their address changed. I wonder if there is a more stable way to identify an object on the heap? I'm not sure whether this is fixable, but it could be worth pointing out in the README? |
I constructed a simple program that should demonstrate what I meant in the original question, i.e. without even considering heap compaction: # frozen_string_literal: true
require 'objspace'
BEFORE = 'before.json'
AFTER = 'after.json'
class Before; end
class After; end
GC.start
ObjectSpace.trace_object_allocations_start
# Allocate nested array.
a = []
1000.times do
a << Before.new
end
ObjectSpace.dump_all(output: open(BEFORE, 'w'))
# Free all array elements and make sure they are GC'ed;
# This will merely tag these object slots as T_NONE.
a.clear
GC.start
# Refill the original array with different objects. These will use the same heap slots
# we just freed, i.e. they will have the same memory address.
1000.times do
a << After.new
end
ObjectSpace.dump_all(output: open(AFTER, 'w')) Here, we first allocate an array of The expected diff should be 1000 instances of the $ruby ./heap.rb && heapy diff before.json after.json
Allocated HASH 1 objects of size 40/40 (in bytes) at: : This is because when Ruby GCs an object, it merely uses type tagging to indicate that this heap slot can be reused by another allocation. But the memory address does not change. So heapy thinks these objects aren't new in the second dump. You can double-check as follows: find the memory address of the $ grep 0x15b4ae0 before.json | head
{"address":"0x1414028", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414050", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414118", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x14141e0", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414208", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414230", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414258", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414280", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x14142f8", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414410", "type":"OBJECT", "class":"0x15b4ae0", "ivars":0, "file":"./heap.rb", "line":18, "method":"new", "generation":9, "memsize":40, "flags":{"wb_protected":true}} grep 0x15b49a0 after.json | head
{"address":"0x1414028", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414050", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414118", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x14141e0", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414208", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414230", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414258", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414280", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x14142f8", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}}
{"address":"0x1414410", "type":"OBJECT", "class":"0x15b49a0", "ivars":0, "file":"./heap.rb", "line":31, "method":"new", "generation":10, "memsize":40, "flags":{"wb_protected":true}} |
Based on this, maybe a better (though not perfect) solution would be to key on both We could also incorporate |
We only work with what we are given. This won’t work with compaction as
you’ve mentioned. It’s somewhat expected and I don’t have a fix or plans
for working on a fix. If you or someone else wants to work on it, awesome.
I generally won’t be able to answer arbitrary questions. I suggest trying
to use heapy and if it’s behaving in a strange manner then you can ask
questions on other Q&A sites or sometimes Reddit to understand the
mechanics at play.
…On Saturday, August 27, 2022, Matthias Käppler ***@***.***> wrote:
Based on this, maybe a better (though not perfect) solution would be to
key on both address and type? That is still not accurate of course since
ivar values may change. It would also not solve the problem with compaction
since unfortunately heap dumps don't appear to serialize the moved flag
of an object slot (though that could be fixed by patching MRI)?
—
Reply to this email directly, view it on GitHub
<#31 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAOSYHQXXWQFJ7HTXNLB7LV3HDOZANCNFSM57W4W5XA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
Richard Schneeman
https://www.schneems.com
he/him
|
Fair enough. I wonder if the typical With that in mind, would it make sense to just document this as a known limitation? I think that's always a good first step. I don't think the issue itself is dramatic, considering the most likely use case is to detect differences in tenured objects i.e. heap growth that isn't just temporary. In order to make this more reliable, the following ideas come to mind:
I believe the reproducible test case I provided is sufficient evidence that we have moved from "arbitrary question" territory into issue territory. Do you disagree? |
I updated the issue title and description to restate it as an issue report rather than a question. Let me know if I am missing anything. |
Nope. Seems good. That comment had less to do with this being a question originally and more with my availability.
Totally. Send me a PR? While Calling GC.compact in the middle of a heap generation is probably unlikely (right now) its good to document edge cases . No matter how unlikely, even one in a million bugs still happen at enough scale. |
Heapy diff can yield both false positives and false negatives due to keying on heap pointers. This is difficult to fix, so we're just documenting this for now. Refs zombocom#31
Happy to help out with this! I took a stab at it here: #32 |
I've been looking at
heapy diff
, and one thing I noticed is that it keys on an object'saddress
to decide whether it was present in the heap dump we compare against.Is this reliable? I looked at MRI, and
address
is just an object'sVALUE
pointer: https://github.com/ruby/ruby/blob/e315f3a1341f123051b75e589b746132c3510079/ext/objspace/objspace_dump.c#L238The way I understood the Ruby GC to work is that each 40B slot (once populated) always contains an
RVALUE
, which is a union type so it can "morph" into a different type. If this object gets GC'ed, it is not removed from the heap page, but rather a flag is cleared that tags this slot (or object) as "empty": https://github.com/ruby/ruby/blob/6ef46f71c743507a0e2ae0eef14dce0539b0ff52/gc.c#L569. This makes a slot reusable by changing its union type, but its memory address does not change.Wouldn't this mean that if an object is GC'ed between two snapshots and the same slot is reused for a completely different object, it would then be omitted from the
heapy diff
, because the slot address already appeared in the first snapshot?I'm sure I'm missing something but I wanted to make sure I understand how this works. Thanks!
The text was updated successfully, but these errors were encountered: