Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more info when GenerateDumpIfDbgRequested fails (take 4) #6344

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kevingosse
Copy link
Collaborator

Summary of changes

Add more info to GenerateDumpIfDbgRequested, remove some.

Reason for change

At this point we know that:

  • all the threads in the crashing process are suspended
  • createdump has exited and become a zombie process

We still don't know why createdump exited without resuming the threads in the crashing process.
At this point it's unlikely we'll able to get any further without enabling strace (createdump does not log an error when resuming threads failed), but I realized that we don't display the content of stderr (even though createdump puts error messages in stderr), so it's worth a try.

Implementation details

To reduce noise, I removed some previous diagnostic actions that are not useful anymore:

  • we don't need the callstacks or dump of the crashing process anymore because we know the issue is that the threads aren't resumed
  • capturing a dump of createdump will always fail because the process is a zombie at that point

@kevingosse kevingosse requested a review from a team as a code owner November 25, 2024 11:40
@github-actions github-actions bot added the area:profiler Issues related to the continous-profiler label Nov 25, 2024
@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Nov 25, 2024

Datadog Report

Branch report: kevin/createdump_stderr
Commit report: e10bd7b
Test service: dd-trace-dotnet

✅ 0 Failed, 454899 Passed, 3205 Skipped, 20h 46m 58.6s Total Time

@andrewlock
Copy link
Member

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6344) - mean (74ms)  : 61, 86
     .   : milestone, 74,
    master - mean (71ms)  : 65, 78
     .   : milestone, 71,

    section CallTarget+Inlining+NGEN
    This PR (6344) - mean (987ms)  : 954, 1020
     .   : milestone, 987,
    master - mean (981ms)  : 960, 1003
     .   : milestone, 981,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6344) - mean (108ms)  : 105, 111
     .   : milestone, 108,
    master - mean (108ms)  : 106, 110
     .   : milestone, 108,

    section CallTarget+Inlining+NGEN
    This PR (6344) - mean (683ms)  : 666, 699
     .   : milestone, 683,
    master - mean (683ms)  : 665, 702
     .   : milestone, 683,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6344) - mean (92ms)  : 90, 94
     .   : milestone, 92,
    master - mean (92ms)  : 89, 95
     .   : milestone, 92,

    section CallTarget+Inlining+NGEN
    This PR (6344) - mean (683ms)  : 437, 928
     .   : milestone, 683,
    master - mean (638ms)  : 623, 653
     .   : milestone, 638,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6344) - mean (191ms)  : 187, 196
     .   : milestone, 191,
    master - mean (190ms)  : 186, 194
     .   : milestone, 190,

    section CallTarget+Inlining+NGEN
    This PR (6344) - mean (1,104ms)  : 1065, 1142
     .   : milestone, 1104,
    master - mean (1,094ms)  : 1072, 1115
     .   : milestone, 1094,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6344) - mean (277ms)  : 273, 281
     .   : milestone, 277,
    master - mean (276ms)  : 272, 279
     .   : milestone, 276,

    section CallTarget+Inlining+NGEN
    This PR (6344) - mean (879ms)  : 847, 912
     .   : milestone, 879,
    master - mean (875ms)  : 850, 900
     .   : milestone, 875,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6344) - mean (265ms)  : 261, 270
     .   : milestone, 265,
    master - mean (264ms)  : 260, 268
     .   : milestone, 264,

    section CallTarget+Inlining+NGEN
    This PR (6344) - mean (853ms)  : 815, 890
     .   : milestone, 853,
    master - mean (854ms)  : 825, 884
     .   : milestone, 854,

Loading

@andrewlock
Copy link
Member

Throughput/Crank Report ⚡

Throughput results for AspNetCoreSimpleController comparing the following branches/commits:

Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red.

Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards!

gantt
    title Throughput Linux x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6344) (11.186M)   : 0, 11185974
    master (11.247M)   : 0, 11246955
    benchmarks/2.9.0 (11.033M)   : 0, 11032866

    section Automatic
    This PR (6344) (7.228M)   : 0, 7228373
    master (7.407M)   : 0, 7406927
    benchmarks/2.9.0 (7.786M)   : 0, 7785853

    section Trace stats
    master (7.695M)   : 0, 7695476

    section Manual
    master (11.240M)   : 0, 11240426

    section Manual + Automatic
    This PR (6344) (6.742M)   : 0, 6742082
    master (6.871M)   : 0, 6870622

    section DD_TRACE_ENABLED=0
    master (10.206M)   : 0, 10206195

Loading
gantt
    title Throughput Linux arm64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6344) (9.549M)   : 0, 9548920
    master (9.563M)   : 0, 9563014
    benchmarks/2.9.0 (9.495M)   : 0, 9494821

    section Automatic
    This PR (6344) (6.460M)   : 0, 6460130
    master (6.412M)   : 0, 6411572

    section Trace stats
    master (6.703M)   : 0, 6703390

    section Manual
    master (9.369M)   : 0, 9369266

    section Manual + Automatic
    This PR (6344) (5.938M)   : 0, 5938086
    master (5.978M)   : 0, 5977872

    section DD_TRACE_ENABLED=0
    master (8.996M)   : 0, 8996087

Loading
gantt
    title Throughput Windows x64 (Total requests) 
    dateFormat  X
    axisFormat %s
    section Baseline
    This PR (6344) (9.090M)   : 0, 9089865
    benchmarks/2.9.0 (10.020M)   : 0, 10019592

    section Automatic
    This PR (6344) (5.837M)   : 0, 5837279
    benchmarks/2.9.0 (7.255M)   : 0, 7255257

    section Manual + Automatic
    This PR (6344) (5.501M)   : 0, 5501465

Loading

@andrewlock
Copy link
Member

Benchmarks Report for tracer 🐌

Benchmarks for #6344 compared to master:

  • 2 benchmarks are faster, with geometric mean 1.120
  • 1 benchmarks are slower, with geometric mean 1.198
  • 1 benchmarks have fewer allocations

The following thresholds were used for comparing the benchmark speeds:

  • Mann–Whitney U test with statistical test for significance of 5%
  • Only results indicating a difference greater than 10% and 0.3 ns are considered.

Allocation changes below 0.5% are ignored.

Benchmark details

Benchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartStopWithChild net6.0 7.97μs 45.4ns 346ns 0.0156 0.0078 0 5.62 KB
master StartStopWithChild netcoreapp3.1 9.98μs 53ns 347ns 0.0142 0.00475 0 5.8 KB
master StartStopWithChild net472 16.2μs 38.1ns 148ns 1.04 0.309 0.103 6.22 KB
#6344 StartStopWithChild net6.0 7.86μs 45.2ns 341ns 0.0119 0.00397 0 5.61 KB
#6344 StartStopWithChild netcoreapp3.1 10.1μs 56.2ns 364ns 0.0242 0.00967 0 5.8 KB
#6344 StartStopWithChild net472 16.2μs 64.1ns 248ns 1.06 0.325 0.103 6.22 KB
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 491μs 235ns 880ns 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 667μs 298ns 1.15μs 0 0 0 2.7 KB
master WriteAndFlushEnrichedTraces net472 855μs 589ns 2.2μs 0.425 0 0 3.3 KB
#6344 WriteAndFlushEnrichedTraces net6.0 497μs 523ns 2.03μs 0 0 0 2.7 KB
#6344 WriteAndFlushEnrichedTraces netcoreapp3.1 656μs 420ns 1.63μs 0 0 0 2.7 KB
#6344 WriteAndFlushEnrichedTraces net472 879μs 455ns 1.64μs 0.434 0 0 3.3 KB
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendRequest net6.0 154μs 878ns 7.13μs 0.147 0 0 14.47 KB
master SendRequest netcoreapp3.1 166μs 975ns 9.35μs 0.168 0 0 17.27 KB
master SendRequest net472 0ns 0ns 0ns 0 0 0 0 b
#6344 SendRequest net6.0 150μs 879ns 8.52μs 0.142 0 0 14.47 KB
#6344 SendRequest netcoreapp3.1 171μs 996ns 9.4μs 0.158 0 0 17.27 KB
#6344 SendRequest net472 0.00141ns 0.000551ns 0.00199ns 0 0 0 0 b
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Fewer allocations 🎉

Fewer allocations 🎉 in #6344

Benchmark Base Allocated Diff Allocated Change Change %
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces‑netcoreapp3.1 41.9 KB 41.69 KB -212 B -0.51%

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master WriteAndFlushEnrichedTraces net6.0 588μs 3.28μs 21.3μs 0.566 0 0 41.66 KB
master WriteAndFlushEnrichedTraces netcoreapp3.1 692μs 3.8μs 21.8μs 0.338 0 0 41.9 KB
master WriteAndFlushEnrichedTraces net472 863μs 4.48μs 21.5μs 8.33 2.5 0.417 53.3 KB
#6344 WriteAndFlushEnrichedTraces net6.0 568μs 2.73μs 13.6μs 0.563 0 0 41.76 KB
#6344 WriteAndFlushEnrichedTraces netcoreapp3.1 667μs 2.89μs 10.8μs 0.343 0 0 41.69 KB
#6344 WriteAndFlushEnrichedTraces net472 846μs 3.47μs 13μs 8.13 2.57 0.428 53.28 KB
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteNonQuery net6.0 1.34μs 0.956ns 3.7ns 0.0141 0 0 1.02 KB
master ExecuteNonQuery netcoreapp3.1 1.79μs 1.07ns 4.14ns 0.0134 0 0 1.02 KB
master ExecuteNonQuery net472 2.14μs 1.9ns 7.37ns 0.156 0.00108 0 987 B
#6344 ExecuteNonQuery net6.0 1.25μs 0.856ns 3.2ns 0.0139 0 0 1.02 KB
#6344 ExecuteNonQuery netcoreapp3.1 1.8μs 1.26ns 4.87ns 0.0136 0 0 1.02 KB
#6344 ExecuteNonQuery net472 2.17μs 2.92ns 11.3ns 0.157 0.00109 0 987 B
Benchmarks.Trace.ElasticsearchBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master CallElasticsearch net6.0 1.16μs 0.422ns 1.52ns 0.0134 0 0 976 B
master CallElasticsearch netcoreapp3.1 1.57μs 1.38ns 5.18ns 0.0133 0 0 976 B
master CallElasticsearch net472 2.6μs 1.99ns 7.7ns 0.158 0 0 995 B
master CallElasticsearchAsync net6.0 1.28μs 0.477ns 1.85ns 0.0133 0 0 952 B
master CallElasticsearchAsync netcoreapp3.1 1.66μs 0.798ns 2.98ns 0.014 0 0 1.02 KB
master CallElasticsearchAsync net472 2.68μs 2.46ns 9.53ns 0.166 0 0 1.05 KB
#6344 CallElasticsearch net6.0 1.1μs 0.593ns 2.22ns 0.0137 0 0 976 B
#6344 CallElasticsearch netcoreapp3.1 1.57μs 0.593ns 2.22ns 0.0127 0 0 976 B
#6344 CallElasticsearch net472 2.48μs 2.26ns 8.77ns 0.157 0 0 995 B
#6344 CallElasticsearchAsync net6.0 1.28μs 3.43ns 13.3ns 0.0135 0 0 952 B
#6344 CallElasticsearchAsync netcoreapp3.1 1.64μs 0.952ns 3.43ns 0.0132 0 0 1.02 KB
#6344 CallElasticsearchAsync net472 2.57μs 2.31ns 8.93ns 0.166 0 0 1.05 KB
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master ExecuteAsync net6.0 1.28μs 0.401ns 1.45ns 0.0135 0 0 952 B
master ExecuteAsync netcoreapp3.1 1.64μs 0.548ns 2.05ns 0.013 0 0 952 B
master ExecuteAsync net472 1.86μs 1.01ns 3.91ns 0.145 0 0 915 B
#6344 ExecuteAsync net6.0 1.18μs 0.669ns 2.5ns 0.013 0 0 952 B
#6344 ExecuteAsync netcoreapp3.1 1.5μs 0.456ns 1.77ns 0.0128 0 0 952 B
#6344 ExecuteAsync net472 1.83μs 0.819ns 3.17ns 0.145 0 0 915 B
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendAsync net6.0 4.28μs 2.91ns 10.9ns 0.032 0 0 2.31 KB
master SendAsync netcoreapp3.1 5.31μs 2.28ns 8.85ns 0.0372 0 0 2.85 KB
master SendAsync net472 7.24μs 1.79ns 6.93ns 0.493 0 0 3.12 KB
#6344 SendAsync net6.0 4.34μs 1.62ns 6.04ns 0.0325 0 0 2.31 KB
#6344 SendAsync netcoreapp3.1 5.37μs 2.53ns 9.47ns 0.0376 0 0 2.85 KB
#6344 SendAsync net472 7.36μs 1.94ns 7.53ns 0.496 0 0 3.12 KB
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 1.45μs 0.734ns 2.75ns 0.0226 0 0 1.64 KB
master EnrichedLog netcoreapp3.1 2.2μs 1.66ns 6.19ns 0.0218 0 0 1.64 KB
master EnrichedLog net472 2.66μs 0.828ns 3.1ns 0.249 0 0 1.57 KB
#6344 EnrichedLog net6.0 1.56μs 2.67ns 9.99ns 0.0232 0 0 1.64 KB
#6344 EnrichedLog netcoreapp3.1 2.17μs 1.04ns 3.91ns 0.0217 0 0 1.64 KB
#6344 EnrichedLog net472 2.68μs 0.779ns 2.81ns 0.249 0 0 1.57 KB
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 117μs 54.4ns 211ns 0 0 0 4.28 KB
master EnrichedLog netcoreapp3.1 120μs 70.4ns 263ns 0 0 0 4.28 KB
master EnrichedLog net472 152μs 118ns 455ns 0.688 0.229 0 4.46 KB
#6344 EnrichedLog net6.0 118μs 123ns 461ns 0.0595 0 0 4.28 KB
#6344 EnrichedLog netcoreapp3.1 124μs 218ns 843ns 0 0 0 4.28 KB
#6344 EnrichedLog net472 152μs 108ns 406ns 0.671 0.224 0 4.46 KB
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.91μs 1.19ns 4.61ns 0.0307 0 0 2.2 KB
master EnrichedLog netcoreapp3.1 4.14μs 1.66ns 5.97ns 0.0289 0 0 2.2 KB
master EnrichedLog net472 4.84μs 1.3ns 4.88ns 0.319 0 0 2.02 KB
#6344 EnrichedLog net6.0 3.16μs 0.836ns 3.24ns 0.0301 0 0 2.2 KB
#6344 EnrichedLog netcoreapp3.1 4.3μs 1.63ns 6.08ns 0.03 0 0 2.2 KB
#6344 EnrichedLog net472 4.86μs 1.5ns 5.83ns 0.32 0 0 2.02 KB
Benchmarks.Trace.RedisBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master SendReceive net6.0 1.42μs 0.764ns 2.96ns 0.0164 0 0 1.14 KB
master SendReceive netcoreapp3.1 1.7μs 0.913ns 3.54ns 0.0155 0 0 1.14 KB
master SendReceive net472 2.03μs 0.746ns 2.79ns 0.183 0 0 1.16 KB
#6344 SendReceive net6.0 1.35μs 0.635ns 2.46ns 0.0162 0 0 1.14 KB
#6344 SendReceive netcoreapp3.1 1.78μs 1.19ns 4.63ns 0.0151 0 0 1.14 KB
#6344 SendReceive net472 2.07μs 1.88ns 7.27ns 0.183 0 0 1.16 KB
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master EnrichedLog net6.0 2.74μs 1.07ns 4.15ns 0.0219 0 0 1.6 KB
master EnrichedLog netcoreapp3.1 4.01μs 0.981ns 3.67ns 0.022 0 0 1.65 KB
master EnrichedLog net472 4.38μs 4.01ns 15.5ns 0.323 0 0 2.04 KB
#6344 EnrichedLog net6.0 2.74μs 0.641ns 2.48ns 0.0219 0 0 1.6 KB
#6344 EnrichedLog netcoreapp3.1 3.94μs 1.62ns 6.06ns 0.0217 0 0 1.65 KB
#6344 EnrichedLog net472 4.48μs 4.62ns 17.9ns 0.324 0 0 2.04 KB
Benchmarks.Trace.SpanBenchmark - Slower ⚠️ Same allocations ✔️

Slower ⚠️ in #6344

Benchmark diff/base Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishSpan‑netcoreapp3.1 1.198 550.99 659.96

Faster 🎉 in #6344

Benchmark base/diff Base Median (ns) Diff Median (ns) Modality
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net6.0 1.124 559.38 497.83
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑netcoreapp3.1 1.117 735.60 658.47

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master StartFinishSpan net6.0 403ns 0.652ns 2.53ns 0.00807 0 0 576 B
master StartFinishSpan netcoreapp3.1 551ns 0.695ns 2.69ns 0.00779 0 0 576 B
master StartFinishSpan net472 690ns 1.81ns 7.02ns 0.0915 0 0 578 B
master StartFinishScope net6.0 559ns 0.821ns 3.18ns 0.00971 0 0 696 B
master StartFinishScope netcoreapp3.1 734ns 1.69ns 6.55ns 0.00946 0 0 696 B
master StartFinishScope net472 938ns 2.23ns 8.63ns 0.104 0 0 658 B
#6344 StartFinishSpan net6.0 447ns 0.715ns 2.77ns 0.00794 0 0 576 B
#6344 StartFinishSpan netcoreapp3.1 660ns 0.519ns 1.94ns 0.00782 0 0 576 B
#6344 StartFinishSpan net472 664ns 1.48ns 5.54ns 0.0916 0 0 578 B
#6344 StartFinishScope net6.0 498ns 1.74ns 6.29ns 0.00986 0 0 696 B
#6344 StartFinishScope netcoreapp3.1 658ns 0.918ns 3.56ns 0.00938 0 0 696 B
#6344 StartFinishScope net472 842ns 1.79ns 6.46ns 0.104 0 0 658 B
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️

Raw results

Branch Method Toolchain Mean StdError StdDev Gen 0 Gen 1 Gen 2 Allocated
master RunOnMethodBegin net6.0 678ns 0.842ns 3.26ns 0.00974 0 0 696 B
master RunOnMethodBegin netcoreapp3.1 880ns 1.28ns 4.97ns 0.00948 0 0 696 B
master RunOnMethodBegin net472 1.13μs 2.05ns 7.92ns 0.104 0 0 658 B
#6344 RunOnMethodBegin net6.0 670ns 1.15ns 4.45ns 0.00969 0 0 696 B
#6344 RunOnMethodBegin netcoreapp3.1 943ns 2.11ns 8.17ns 0.00947 0 0 696 B
#6344 RunOnMethodBegin net472 1.06μs 3.3ns 12.8ns 0.104 0 0 658 B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:profiler Issues related to the continous-profiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants