-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unpack: use lazy alloc to avoid large array/map OOM #41
Open
Etsukata
wants to merge
1
commit into
msgpack:master
Choose a base branch
from
Etsukata:unpack-array-lazy-alloc
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently, unpacking large array-32 can easily leads to OOM. For example, the following one-liner code tries to allocate more than 30 GB but failed with ENOMEM. ``` $ strace perl -MData::MessagePack -e 'Data::MessagePack->unpack("\xdd\xffAAA");' 2>&1 | grep ENOMEM mmap(NULL, 34259734528, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) mmap(NULL, 34259865600, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) ``` This means that if an user tries to unpack untrusted msgpack-encoded input, it can lead to DoS due to OOM (Of course we shouldn't unpack untrusted data without any validation). This patch re-introduces lazy allocation to unpack large array/map to avoid OOM. In comparison with pre-allocation, lazy allocation can slow down unpacking speed. To reduce the slow-down impact, this patch uses `MIN(n, 4096)` tuning based on the following benchmarks. === Benchmarks === The following script simply compares speeds of unpacking array/map data at each size. - bench.pl ``` use strict; use warnings; use Benchmark qw(:all); use Data::MessagePack; my %data = (); my %bench_a = (); my %bench_h = (); for (my $i = 1; $i < 15; $i++) { my $n = 2 ** $i; $data{"array_$n"} = Data::MessagePack->pack([1..$n]); my %h = map { $_ => $_ } (1..$n); $data{"hash_$n"} = Data::MessagePack->pack(\%h); $bench_a{"array_$n"} = sub { Data::MessagePack->unpack($data{"array_$n"}) }; $bench_h{"hash_$n"} = sub { Data::MessagePack->unpack($data{"hash_$n"}) }; } cmpthese(-1, \%bench_a); cmpthese(-1, \%bench_h); ``` As for unpacking array, we compare these 3 patters to understand why we need `MIN(n, 4096)` tuning. - 1. Current implementation - 2. No av_extend() - 3. MIN(n, 4096) Results: - 1. Current implementation ``` Rate array_16384 array_8192 array_4096 array_2048 array_1024 array_512 array_256 array_128 array_64 array_32 array_16 array_8 array_4 array_2 array_16384 2017/s -- -50% -75% -88% -94% -97% -99% -99% -100% -100% -100% -100% -100% -100% array_8192 4035/s 100% -- -50% -76% -88% -94% -97% -99% -99% -100% -100% -100% -100% -100% array_4096 8145/s 304% 102% -- -51% -76% -88% -94% -97% -99% -99% -99% -100% -100% -100% array_2048 16748/s 730% 315% 106% -- -51% -75% -88% -94% -97% -98% -99% -99% -99% -99% array_1024 34132/s 1592% 746% 319% 104% -- -49% -76% -88% -94% -96% -98% -98% -99% -99% array_512 66369/s 3190% 1545% 715% 296% 94% -- -53% -76% -88% -93% -95% -97% -97% -98% array_256 142175/s 6948% 3423% 1646% 749% 317% 114% -- -49% -74% -85% -90% -93% -94% -95% array_128 281097/s 13836% 6866% 3351% 1578% 724% 324% 98% -- -49% -70% -80% -86% -89% -90% array_64 546133/s 26975% 13434% 6606% 3161% 1500% 723% 284% 94% -- -41% -61% -72% -78% -81% array_32 927395/s 45876% 22883% 11287% 5437% 2617% 1297% 552% 230% 70% -- -34% -52% -63% -67% array_16 1406495/s 69628% 34756% 17169% 8298% 4021% 2019% 889% 400% 158% 52% -- -28% -44% -50% array_8 1946613/s 96405% 48142% 23801% 11523% 5603% 2833% 1269% 593% 256% 110% 38% -- -22% -31% array_4 2496610/s 123671% 61772% 30554% 14807% 7214% 3662% 1656% 788% 357% 169% 78% 28% -- -12% array_2 2840570/s 140723% 70296% 34777% 16861% 8222% 4180% 1898% 911% 420% 206% 102% 46% 14% -- ``` - 2. No av_extend() ``` Rate array_16384 array_8192 array_4096 array_2048 array_1024 array_512 array_256 array_128 array_64 array_32 array_16 array_8 array_4 array_2 array_16384 1969/s -- -50% -74% -87% -93% -96% -98% -99% -99% -100% -100% -100% -100% -100% array_8192 3976/s 102% -- -48% -73% -86% -93% -96% -98% -99% -99% -100% -100% -100% -100% array_4096 7657/s 289% 93% -- -48% -72% -86% -92% -96% -98% -99% -99% -100% -100% -100% array_2048 14769/s 650% 271% 93% -- -47% -74% -85% -92% -96% -97% -98% -99% -99% -99% array_1024 27675/s 1306% 596% 261% 87% -- -51% -72% -84% -92% -95% -97% -98% -99% -99% array_512 56220/s 2756% 1314% 634% 281% 103% -- -43% -68% -83% -90% -94% -96% -98% -98% array_256 98869/s 4922% 2387% 1191% 569% 257% 76% -- -44% -71% -82% -89% -94% -96% -96% array_128 177535/s 8918% 4365% 2218% 1102% 542% 216% 80% -- -47% -68% -81% -88% -93% -94% array_64 335344/s 16935% 8334% 4279% 2171% 1112% 496% 239% 89% -- -40% -63% -78% -86% -88% array_32 563577/s 28529% 14075% 7260% 3716% 1936% 902% 470% 217% 68% -- -39% -63% -77% -80% array_16 918728/s 46570% 23007% 11898% 6121% 3220% 1534% 829% 417% 174% 63% -- -40% -63% -67% array_8 1542023/s 78232% 38684% 20038% 10341% 5472% 2643% 1460% 769% 360% 174% 68% -- -38% -45% array_4 2479740/s 125866% 62268% 32284% 16690% 8860% 4311% 2408% 1297% 639% 340% 170% 61% -- -12% array_2 2812991/s 142795% 70650% 36636% 18947% 10064% 4904% 2745% 1484% 739% 399% 206% 82% 13% -- ``` - 3. MIN(n, 4096) ``` Rate array_16384 array_8192 array_4096 array_2048 array_1024 array_512 array_256 array_128 array_64 array_32 array_16 array_8 array_4 array_2 array_16384 2017/s -- -50% -76% -88% -94% -97% -99% -99% -100% -100% -100% -100% -100% -100% array_8192 4073/s 102% -- -51% -76% -88% -94% -97% -99% -99% -100% -100% -100% -100% -100% array_4096 8296/s 311% 104% -- -51% -76% -88% -94% -97% -98% -99% -99% -100% -100% -100% array_2048 16906/s 738% 315% 104% -- -51% -75% -88% -94% -97% -98% -99% -99% -99% -99% array_1024 34461/s 1608% 746% 315% 104% -- -50% -76% -88% -94% -96% -98% -98% -99% -99% array_512 68267/s 3284% 1576% 723% 304% 98% -- -52% -77% -87% -93% -95% -97% -97% -98% array_256 143359/s 7007% 3420% 1628% 748% 316% 110% -- -51% -74% -85% -90% -93% -94% -95% array_128 295080/s 14529% 7145% 3457% 1645% 756% 332% 106% -- -46% -68% -79% -85% -88% -90% array_64 546132/s 26975% 13310% 6483% 3130% 1485% 700% 281% 85% -- -41% -61% -73% -78% -81% array_32 927395/s 45876% 22671% 11078% 5386% 2591% 1258% 547% 214% 70% -- -34% -53% -62% -67% array_16 1406495/s 69628% 34434% 16853% 8220% 3981% 1960% 881% 377% 158% 52% -- -29% -43% -50% array_8 1989485/s 98530% 48749% 23880% 11668% 5673% 2814% 1288% 574% 264% 115% 41% -- -19% -30% array_4 2457599/s 121737% 60243% 29523% 14437% 7032% 3500% 1614% 733% 350% 165% 75% 24% -- -13% array_2 2840570/s 140723% 69646% 34139% 16702% 8143% 4061% 1881% 863% 420% 206% 102% 43% 16% -- ``` We can see just removing av_extend() (pattern 2.) affects unpacking array speed especially when the number of elements are 128 (40 % slowdown). But MIN(n, 4096) (pattern 3.) does not show remarkable side effects. As for unpacking map, the following benchmark results show as the number of elements in hash grows, the unpacking speed goes down. To mitigate the overhead of lazy allocation, this patch adds MIN(n, 65536) tuning, which will not affect anything in usual usage , assuming we do not usually unpack so large map. - 1. Current implementation ``` Rate hash_16384 hash_8192 hash_4096 hash_2048 hash_1024 hash_512 hash_256 hash_128 hash_64 hash_32 hash_16 hash_8 hash_4 hash_2 hash_16384 221/s -- -50% -75% -88% -94% -97% -99% -99% -100% -100% -100% -100% -100% -100% hash_8192 445/s 102% -- -50% -77% -89% -95% -97% -99% -99% -100% -100% -100% -100% -100% hash_4096 898/s 307% 102% -- -53% -77% -89% -95% -97% -99% -99% -100% -100% -100% -100% hash_2048 1914/s 768% 330% 113% -- -52% -77% -89% -95% -97% -99% -99% -100% -100% -100% hash_1024 3965/s 1698% 791% 341% 107% -- -52% -77% -89% -94% -98% -99% -99% -100% -100% hash_512 8270/s 3650% 1759% 821% 332% 109% -- -51% -77% -88% -95% -97% -99% -99% -99% hash_256 16906/s 7567% 3700% 1782% 783% 326% 104% -- -52% -76% -90% -94% -97% -98% -99% hash_128 35544/s 16019% 7890% 3857% 1757% 797% 330% 110% -- -50% -78% -88% -94% -96% -98% hash_64 71739/s 32433% 16027% 7886% 3648% 1709% 767% 324% 102% -- -56% -76% -87% -92% -95% hash_32 162293/s 73498% 36385% 17968% 8380% 3994% 1862% 860% 357% 126% -- -46% -71% -83% -89% hash_16 297890/s 134990% 66868% 33063% 15465% 7414% 3502% 1662% 738% 315% 84% -- -46% -68% -80% hash_8 551384/s 249946% 123854% 61283% 28710% 13808% 6567% 3162% 1451% 669% 240% 85% -- -42% -64% hash_4 945230/s 428551% 212394% 105129% 49288% 23742% 11330% 5491% 2559% 1218% 482% 217% 71% -- -38% hash_2 1527475/s 692592% 343286% 169948% 79710% 38428% 18370% 8935% 4197% 2029% 841% 413% 177% 62% -- ``` - 2. No hv_ksplit() ``` Rate hash_16384 hash_8192 hash_4096 hash_2048 hash_1024 hash_512 hash_256 hash_128 hash_64 hash_32 hash_16 hash_8 hash_4 hash_2 hash_16384 195/s -- -51% -76% -88% -94% -97% -99% -99% -100% -100% -100% -100% -100% -100% hash_8192 403/s 106% -- -51% -75% -88% -94% -97% -99% -99% -100% -100% -100% -100% -100% hash_4096 814/s 317% 102% -- -50% -76% -88% -94% -97% -99% -99% -100% -100% -100% -100% hash_2048 1643/s 741% 308% 102% -- -52% -76% -89% -95% -98% -99% -99% -100% -100% -100% hash_1024 3412/s 1647% 747% 319% 108% -- -50% -76% -89% -95% -98% -99% -99% -100% -100% hash_512 6796/s 3379% 1587% 735% 314% 99% -- -53% -78% -90% -95% -98% -99% -99% -100% hash_256 14490/s 7318% 3498% 1681% 782% 325% 113% -- -54% -79% -90% -95% -97% -98% -99% hash_128 31508/s 16031% 7724% 3772% 1818% 823% 364% 117% -- -55% -78% -88% -94% -97% -98% hash_64 69818/s 35644% 17237% 8481% 4149% 1946% 927% 382% 122% -- -50% -74% -87% -93% -95% hash_32 140894/s 72032% 34887% 17217% 8475% 4029% 1973% 872% 347% 102% -- -48% -73% -85% -90% hash_16 273066/s 139699% 67708% 33461% 16519% 7902% 3918% 1785% 767% 291% 94% -- -48% -71% -81% hash_8 524088/s 268213% 130043% 64313% 31796% 15258% 7612% 3517% 1563% 651% 272% 92% -- -45% -64% hash_4 945230/s 483821% 234621% 116073% 57427% 27600% 13809% 6423% 2900% 1254% 571% 246% 80% -- -34% hash_2 1442616/s 738464% 358134% 177205% 87697% 42176% 21129% 9856% 4479% 1966% 924% 428% 175% 53% -- ``` Ref: - similar problem in msgpack-python : msgpack/msgpack-python#97 - fix : msgpack/msgpack-python#105
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, unpacking large array-32 can easily leads to OOM.
For example, the following one-liner code tries to allocate more than 30
GB but failed with ENOMEM.
This means that if an user tries to unpack untrusted msgpack-encoded input,
it can lead to DoS due to OOM (Of course we shouldn't unpack untrusted data
without any validation).
This patch re-introduces lazy allocation to unpack large array/map to avoid OOM.
In comparison with pre-allocation, lazy allocation can slow down unpacking speed.
To reduce the slow-down impact, this patch uses
MIN(n, 4096)
tuningbased on the following benchmarks.
=== Benchmarks ===
The following script simply compares speeds of unpacking array/map data
at each size.
As for unpacking array, we compare these 3 patters to understand why
we need
MIN(n, 4096)
tuning.Results:
We can see just removing av_extend() (pattern 2.) affects unpacking
array speed especially when the number of elements are 128 (40 %
slowdown).
But MIN(n, 4096) (pattern 3.) does not show remarkable side effects.
As for unpacking map, the following benchmark results show as the number
of elements in hash grows, the unpacking speed goes down.
To mitigate the overhead of lazy allocation, this patch adds
MIN(n, 65536) tuning, which will not affect anything in usual usage
, assuming we do not usually unpack so large map.
Ref: