-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Atree Register Inlining and Data Deduplication #342
Atree Register Inlining and Data Deduplication #342
Conversation
Currently, every array or map are stored in its own slab and parent slab refers to child array or map by SlabID. The current approach can lead to many small slabs, especially for Cadence data structures with multiple nested levels. This commit inlines child array/map in parent slab when: - child array/map fits in one slab (root slab is data slab) - encoded size of inlined child array/map is less than the max inline size limit enforced by parent This commit optimizes encoding size by: - reusing inlined array types - reusing seed, digests, and field names of inlined composite types Also update debugging code to handle inlined array/map element.
Currently, in parentUpdater callback, parent array/map resets same child value. Child value ID should match overwritten SlabIDStorable or Slab.SlabID(). This commit adds check to make sure same child value is being reset.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #342 +/- ##
==========================================
- Coverage 64.93% 62.62% -2.31%
==========================================
Files 14 15 +1
Lines 8811 10559 +1748
==========================================
+ Hits 5721 6613 +892
- Misses 2356 3004 +648
- Partials 734 942 +208
☔ View full report in Codecov by Sentry. |
Fantastic work Faye! 👏 Some notes from the first review session:
I'll do a proper code review after the next review session. So far the work looks great, well done! 👏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 the logic and key parts of the changes look good to me, I did not check all the lines as we have a very good coverage by the tests.
Besides the suggestions I put in the code, I suggest also adding this logic to the stress test or tests to check for register allocation leaks (register falls behind), especially when we continuously move from large mutable to inlineable ones.
p.s. (cc: @janezpodhostnik ) I think we need higher-level testing for applications like the FVM environment but that's out of the scope of this repo. we might re-use the NFT tracker reporter to check the health before and after migration. and simulate some transactions to check the changes (especially re-sizing logic).
type EquatableStorable interface { | ||
Storable | ||
// Equal returns true if the given storable is equal to this storable. | ||
Equal(Storable) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
storage.go
Outdated
copy(id[:], sid.address[:]) | ||
copy(id[8:], sid.index[:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we just add a warning that if we change the size of the address or index, we could have a problem here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet, consider using unsafe.Sizeof
for the definition of ValueID
above, in the range expression here. See for example: https://goplay.tools/snippet/EtdTvPNnIvF
typeinfo.go
Outdated
"github.com/fxamacker/cbor/v2" | ||
) | ||
|
||
type TypeInfo interface { | ||
Encode(*cbor.StreamEncoder) error | ||
IsComposite() bool | ||
ID() string | ||
// TODO: maybe add a copy function because decoded TypeInfo can be shared by multiple slabs if not copied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still a relevant TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still a relevant TODO?
Yes, good catch. The TODO was a precaution that was postponed to get PR reviews started on the other parts.
It's done in 72d3614. TypeInfo
is copied in the commit as a precaution because TypeInfo
can be shared among inlined slabs referring to the same type and any changes to one may affect all others if not copied.
map.go
Outdated
firstKey: elements.firstKey(), | ||
} | ||
|
||
// TODO: does extra data needs to be copied? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still a valid TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still a valid TODO?
Yes 👍 See comment above and this is also done in the same commit 72d3614.
This commit adds callback notification in the set or inserted child element in array. So if child element is modified after Array.Set() or Array.Insert(), changes to child element is properly reflected in the parent array. This commit doesn't appear to be needed by current version of Cadence but it helps reduce Atree's dependence on an implementation detail of Cadence.
This commit adds callback notification in the set child element in map. So if child element is modified after OrderedMap.Set() , changes to child element is properly reflected in the parent map. This commit doesn't appear to be needed by current version of Cadence but it helps reduce Atree's dependence on an implementation detail of Cadence.
Currently, deduplication feature in PR 342 (not merged yet) does not support Cadence attachments. Add support for Cadence attachments and other future use cases by using type ID and sorted field names instead of field count. While at it, also refactor encoding composite type to reduce traversal and type assertion.
Decoded ExtraData (including TypeInfo) can be shared by all inlined slabs. This commit creates a new ExtraData with copied TypeInfo for inlined slabs to prevent accidental mutation.
Decoded ExtraData (including keys) can be shared by all inlined composite referring to the same type. This commit copies key for inlined composite to prevent accidental mutation.
I had a look at adding the container/child address matching checks to Cadence: Just before performing a mutating operation on a container (array/map), we already transfer the child, so checking the invariant does not really help much: if we insert the assertion we can also check the transfer. On the atree side, it's not easy to check this invariant: children are values at first, and getting the storable (which is needed to get the potential storage ID) is delayed until the very "end", e.g. deep down when the container/parent is not available anymore. Maybe there isn't much we can do here |
…omparator-and-hashinputprovider-to-mapiterator
@turbolent Thanks for detailed review and meeting notes! I think everything discussed so far have been addressed in this PR. 🎉
BTW, can you please take a look at PR #345? That is the 2nd of 2 atree inlining PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work Faye! 👏
My review focused mainly on the changes to array and ordered map, I didn't review the tests, but they seem very thorough (18 kLOC wow!) 👏
As mentioned above, we can address remaining work in follow-up PRs
…ashinputprovider-to-mapiterator Add readonly iterators and support value mutations only from non-readonly iterators
Updates epics #292 onflow/flow-go#1744
Updates #296 #297 #347 #348 onflow/cadence#1854
Required by Cadence Integration with Atree onflow/cadence#2809
Required by Atree Storage Migration:
NOTE: some of the above updated issues will be closed by this PR:
Problem
Memory use on execution nodes, register count, and growth rate can be improved.
Currently, every array or map are stored in its own slab and parent slab refers to child array or map by SlabID.
The current approach can lead to many small slabs, especially for Cadence data structures with multiple nested levels.
Another aspect is duplicate data, especially when encoding nested Cadence composites. The design of atree inlining already requries changing the encoding format, so use this opportunity to also deduplicate data when encoding nested composites.
Changes
Reduce register count by inlining and optimize encoding size by deduplicating Cadence composite types within slabs.
Also update debugging code to handle inlined array/map element.
TODO
In this PR
Atree Inlining Part 2
Open new PR to add support for inlinability of child values returned by iterator (requires PR #345)
Cadence Integration
Open new PR in onflow/cadence to integrate with Cadence.
main
branchFiles changed
in the Github PR explorer