Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inject trace context into AWS Step Functions input #7585

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

DylanLovesCoffee
Copy link
Contributor

@DylanLovesCoffee DylanLovesCoffee commented Sep 9, 2024

What Does This Do

Adds an instrumentation for AWS SDK Step Functions. This enables tracing for when a Lambda function invokes a Step Function. Trace context is injected into the Step Function's StartExecutionRequest/StartSyncExecutionRequest.Input object.

Example of traced step function in the Serverless org: app

The Logs to Traces Reducer will read the trace context from the Step Function logs and create a span for the Step Function.

Motivation

Continues the work done in Python and NodeJS.

Additional Notes

Screenshots of feature:
sfn-exec
sfn-sync-exec

Contributor Checklist

Jira ticket: [SVLS-5249](https://datadoghq.atlassian.net/browse/SVLS-5249)

Copy link
Collaborator

@amarziali amarziali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using reflection is not efficient and should be avoided. Instead, MethodHandles can be used.
The pull request lacks also of a minimum test coverage. Test must be added

@@ -398,6 +403,37 @@ public AgentSpan onSdkResponse(
return span;
}

public static void injectTraceToStepFunctionInput(SdkRequest request, AgentSpan span) {
Class<?> clazz = request.getClass();
if ((clazz.getSimpleName().equals("StartExecutionRequest"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May instanceof be used here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main issue is that not all AWS types will be available on the classpath - it depends which parts of the SDK the user deploys. So you can't use types directly like with instanceof.

However, I'd prefer to see this use the fully qualified class name, because there is a cost to calling getSimpleName and we know the package(s) ahead of time.

ddInputNode.put("_datadog", traceContext);
}

Field inputField = clazz.getDeclaredField("input");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty inefficient and is discouraged. Instead you might obtain (once a MethodHandle) and unreflect it in order to avoid paying that extra cost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - I would prefer this to be moved to a step-function specific instrumentation module, like the SNS specific instrumentation module(s) added in #6908 - you can still intercept the request and modify it as appropriate, but you'll have access to the right types and can directly target the appropriate part of the AWS SDK

Trying to do this in the generic AWS SDK instrumentation will make this code convoluted and hard to maintain.

I'd also recommend avoiding adding a dependency to com.fasterxml.jackson.databind in the instrumentation, because that might not be accessible in certain deployments - ideally take the same approach as SnsInterceptor and just build up the simple string (the JSON isn't that complex)

In fact it's probably a good idea to refactor that _datadog JSON building logic out so it can be shared across all the different AWS SDK interceptors that want to add _datadog to their requests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks! That helps a ton. I'll give that a shot to try to clean things up + add test coverage

@pr-commenter
Copy link

pr-commenter bot commented Sep 16, 2024

Benchmarks

Startup

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master dylan/sfn-trace-ctx
git_commit_date 1729717305 1729719446
git_commit_sha 0cf47ed 3039780
release_version 1.42.0-SNAPSHOT~0cf47ed20e 1.42.0-SNAPSHOT~30397802a8
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1729724777 1729724777
ci_job_id 682326231 682326231
ci_pipeline_id 47290650 47290650
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
module Agent Agent
parent None None
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 55 metrics, 8 unstable metrics.

Startup time reports for petclinic
gantt
    title petclinic - global startup overhead: candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.085 s) : 0, 1085441
Total [baseline] (10.494 s) : 0, 10493826
Agent [candidate] (1.085 s) : 0, 1085170
Total [candidate] (10.444 s) : 0, 10444175
section appsec
Agent [baseline] (1.219 s) : 0, 1218927
Total [baseline] (10.71 s) : 0, 10709842
Agent [candidate] (1.218 s) : 0, 1217653
Total [candidate] (10.664 s) : 0, 10664012
section iast
Agent [baseline] (1.209 s) : 0, 1208962
Total [baseline] (10.907 s) : 0, 10906523
Agent [candidate] (1.21 s) : 0, 1209769
Total [candidate] (10.888 s) : 0, 10887649
section profiling
Agent [baseline] (1.281 s) : 0, 1280920
Total [baseline] (10.705 s) : 0, 10704917
Agent [candidate] (1.279 s) : 0, 1278615
Total [candidate] (10.743 s) : 0, 10743475
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.085 s -
Agent appsec 1.219 s 133.486 ms (12.3%)
Agent iast 1.209 s 123.521 ms (11.4%)
Agent profiling 1.281 s 195.479 ms (18.0%)
Total tracing 10.494 s -
Total appsec 10.71 s 216.016 ms (2.1%)
Total iast 10.907 s 412.697 ms (3.9%)
Total profiling 10.705 s 211.091 ms (2.0%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.085 s -
Agent appsec 1.218 s 132.484 ms (12.2%)
Agent iast 1.21 s 124.599 ms (11.5%)
Agent profiling 1.279 s 193.445 ms (17.8%)
Total tracing 10.444 s -
Total appsec 10.664 s 219.837 ms (2.1%)
Total iast 10.888 s 443.474 ms (4.2%)
Total profiling 10.743 s 299.3 ms (2.9%)
gantt
    title petclinic - break down per module: candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (690.482 ms) : 0, 690482
BytebuddyAgent [candidate] (690.764 ms) : 0, 690764
GlobalTracer [baseline] (315.731 ms) : 0, 315731
GlobalTracer [candidate] (315.511 ms) : 0, 315511
AppSec [baseline] (54.354 ms) : 0, 54354
AppSec [candidate] (54.082 ms) : 0, 54082
Remote Config [baseline] (669.417 µs) : 0, 669
Remote Config [candidate] (659.642 µs) : 0, 660
Telemetry [baseline] (10.371 ms) : 0, 10371
Telemetry [candidate] (10.331 ms) : 0, 10331
section appsec
BytebuddyAgent [baseline] (707.487 ms) : 0, 707487
BytebuddyAgent [candidate] (706.179 ms) : 0, 706179
GlobalTracer [baseline] (312.766 ms) : 0, 312766
GlobalTracer [candidate] (312.316 ms) : 0, 312316
AppSec [baseline] (166.529 ms) : 0, 166529
AppSec [candidate] (166.271 ms) : 0, 166271
Remote Config [baseline] (630.732 µs) : 0, 631
Remote Config [candidate] (638.111 µs) : 0, 638
Telemetry [baseline] (7.369 ms) : 0, 7369
Telemetry [candidate] (8.088 ms) : 0, 8088
IAST [baseline] (20.663 ms) : 0, 20663
IAST [candidate] (20.726 ms) : 0, 20726
section iast
BytebuddyAgent [baseline] (805.814 ms) : 0, 805814
BytebuddyAgent [candidate] (806.248 ms) : 0, 806248
GlobalTracer [baseline] (303.735 ms) : 0, 303735
GlobalTracer [candidate] (303.975 ms) : 0, 303975
AppSec [baseline] (57.419 ms) : 0, 57419
AppSec [candidate] (55.448 ms) : 0, 55448
Remote Config [baseline] (600.239 µs) : 0, 600
Remote Config [candidate] (588.339 µs) : 0, 588
Telemetry [baseline] (7.157 ms) : 0, 7157
Telemetry [candidate] (6.998 ms) : 0, 6998
IAST [baseline] (20.426 ms) : 0, 20426
IAST [candidate] (22.655 ms) : 0, 22655
section profiling
ProfilingAgent [baseline] (91.748 ms) : 0, 91748
ProfilingAgent [candidate] (92.279 ms) : 0, 92279
BytebuddyAgent [baseline] (684.997 ms) : 0, 684997
BytebuddyAgent [candidate] (682.112 ms) : 0, 682112
GlobalTracer [baseline] (398.109 ms) : 0, 398109
GlobalTracer [candidate] (398.075 ms) : 0, 398075
AppSec [baseline] (54.61 ms) : 0, 54610
AppSec [candidate] (54.84 ms) : 0, 54840
Remote Config [baseline] (646.44 µs) : 0, 646
Remote Config [candidate] (654.178 µs) : 0, 654
Telemetry [baseline] (11.611 ms) : 0, 11611
Telemetry [candidate] (11.736 ms) : 0, 11736
Profiling [baseline] (91.772 ms) : 0, 91772
Profiling [candidate] (92.302 ms) : 0, 92302
Loading
Startup time reports for insecure-bank
gantt
    title insecure-bank - global startup overhead: candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.082 s) : 0, 1081765
Total [baseline] (8.615 s) : 0, 8615252
Agent [candidate] (1.098 s) : 0, 1098301
Total [candidate] (8.696 s) : 0, 8696397
section iast
Agent [baseline] (1.209 s) : 0, 1208615
Total [baseline] (9.158 s) : 0, 9158480
Agent [candidate] (1.207 s) : 0, 1207464
Total [candidate] (9.177 s) : 0, 9177432
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.209 s) : 0, 1208603
Total [baseline] (9.126 s) : 0, 9126263
Agent [candidate] (1.209 s) : 0, 1208967
Total [candidate] (9.148 s) : 0, 9147649
section iast_TELEMETRY_OFF
Agent [baseline] (1.205 s) : 0, 1205062
Total [baseline] (9.172 s) : 0, 9171690
Agent [candidate] (1.213 s) : 0, 1212908
Total [candidate] (9.126 s) : 0, 9125775
Loading
  • baseline results
Module Variant Duration Δ tracing
Agent tracing 1.082 s -
Agent iast 1.209 s 126.85 ms (11.7%)
Agent iast_HARDCODED_SECRET_DISABLED 1.209 s 126.838 ms (11.7%)
Agent iast_TELEMETRY_OFF 1.205 s 123.297 ms (11.4%)
Total tracing 8.615 s -
Total iast 9.158 s 543.228 ms (6.3%)
Total iast_HARDCODED_SECRET_DISABLED 9.126 s 511.011 ms (5.9%)
Total iast_TELEMETRY_OFF 9.172 s 556.438 ms (6.5%)
  • candidate results
Module Variant Duration Δ tracing
Agent tracing 1.098 s -
Agent iast 1.207 s 109.162 ms (9.9%)
Agent iast_HARDCODED_SECRET_DISABLED 1.209 s 110.666 ms (10.1%)
Agent iast_TELEMETRY_OFF 1.213 s 114.607 ms (10.4%)
Total tracing 8.696 s -
Total iast 9.177 s 481.036 ms (5.5%)
Total iast_HARDCODED_SECRET_DISABLED 9.148 s 451.252 ms (5.2%)
Total iast_TELEMETRY_OFF 9.126 s 429.378 ms (4.9%)
gantt
    title insecure-bank - break down per module: candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e

    dateFormat X
    axisFormat %s
section tracing
BytebuddyAgent [baseline] (689.889 ms) : 0, 689889
BytebuddyAgent [candidate] (697.531 ms) : 0, 697531
GlobalTracer [baseline] (314.938 ms) : 0, 314938
GlobalTracer [candidate] (320.429 ms) : 0, 320429
AppSec [baseline] (54.296 ms) : 0, 54296
AppSec [candidate] (55.151 ms) : 0, 55151
Remote Config [baseline] (661.435 µs) : 0, 661
Remote Config [candidate] (681.116 µs) : 0, 681
Telemetry [baseline] (8.151 ms) : 0, 8151
Telemetry [candidate] (10.515 ms) : 0, 10515
section iast
BytebuddyAgent [baseline] (805.32 ms) : 0, 805320
BytebuddyAgent [candidate] (804.574 ms) : 0, 804574
GlobalTracer [baseline] (303.5 ms) : 0, 303500
GlobalTracer [candidate] (303.506 ms) : 0, 303506
AppSec [baseline] (57.814 ms) : 0, 57814
AppSec [candidate] (56.765 ms) : 0, 56765
IAST [baseline] (20.466 ms) : 0, 20466
IAST [candidate] (21.155 ms) : 0, 21155
Remote Config [baseline] (612.536 µs) : 0, 613
Remote Config [candidate] (604.832 µs) : 0, 605
Telemetry [baseline] (7.078 ms) : 0, 7078
Telemetry [candidate] (7.016 ms) : 0, 7016
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (804.826 ms) : 0, 804826
BytebuddyAgent [candidate] (805.217 ms) : 0, 805217
GlobalTracer [baseline] (303.641 ms) : 0, 303641
GlobalTracer [candidate] (303.353 ms) : 0, 303353
AppSec [baseline] (58.101 ms) : 0, 58101
AppSec [candidate] (55.719 ms) : 0, 55719
IAST [baseline] (20.458 ms) : 0, 20458
IAST [candidate] (22.998 ms) : 0, 22998
Remote Config [baseline] (599.373 µs) : 0, 599
Remote Config [candidate] (611.909 µs) : 0, 612
Telemetry [baseline] (7.183 ms) : 0, 7183
Telemetry [candidate] (7.195 ms) : 0, 7195
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (801.581 ms) : 0, 801581
BytebuddyAgent [candidate] (806.708 ms) : 0, 806708
GlobalTracer [baseline] (304.357 ms) : 0, 304357
GlobalTracer [candidate] (305.745 ms) : 0, 305745
AppSec [baseline] (57.695 ms) : 0, 57695
AppSec [candidate] (58.575 ms) : 0, 58575
IAST [baseline] (19.998 ms) : 0, 19998
IAST [candidate] (20.235 ms) : 0, 20235
Remote Config [baseline] (595.976 µs) : 0, 596
Remote Config [candidate] (604.533 µs) : 0, 605
Telemetry [baseline] (6.979 ms) : 0, 6979
Telemetry [candidate] (7.109 ms) : 0, 7109
Loading

Load

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
end_time 2024-10-23T22:36:38 2024-10-23T22:43:29
git_branch master dylan/sfn-trace-ctx
git_commit_date 1729717305 1729719446
git_commit_sha 0cf47ed 3039780
release_version 1.42.0-SNAPSHOT~0cf47ed20e 1.42.0-SNAPSHOT~30397802a8
start_time 2024-10-23T22:36:25 2024-10-23T22:43:16
See matching parameters
Baseline Candidate
application insecure-bank insecure-bank
ci_job_date 1729723755 1729723755
ci_job_id 682326180 682326180
ci_pipeline_id 47290650 47290650
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant iast iast

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 17 unstable metrics.

Request duration reports for insecure-bank
gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e
    dateFormat X
    axisFormat %s
section baseline
no_agent (376.487 µs) : 357, 396
.   : milestone, 376,
iast (483.745 µs) : 462, 505
.   : milestone, 484,
iast_FULL (558.336 µs) : 537, 580
.   : milestone, 558,
iast_GLOBAL (507.661 µs) : 487, 529
.   : milestone, 508,
iast_HARDCODED_SECRET_DISABLED (480.943 µs) : 460, 502
.   : milestone, 481,
iast_INACTIVE (441.893 µs) : 421, 462
.   : milestone, 442,
iast_TELEMETRY_OFF (475.558 µs) : 454, 497
.   : milestone, 476,
tracing (445.09 µs) : 424, 466
.   : milestone, 445,
section candidate
no_agent (366.239 µs) : 347, 386
.   : milestone, 366,
iast (486.627 µs) : 465, 508
.   : milestone, 487,
iast_FULL (561.464 µs) : 540, 583
.   : milestone, 561,
iast_GLOBAL (518.5 µs) : 496, 541
.   : milestone, 519,
iast_HARDCODED_SECRET_DISABLED (482.718 µs) : 461, 504
.   : milestone, 483,
iast_INACTIVE (449.674 µs) : 428, 471
.   : milestone, 450,
iast_TELEMETRY_OFF (466.066 µs) : 445, 487
.   : milestone, 466,
tracing (441.142 µs) : 420, 462
.   : milestone, 441,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 376.487 µs [356.735 µs, 396.239 µs] -
iast 483.745 µs [462.004 µs, 505.486 µs] 107.258 µs (28.5%)
iast_FULL 558.336 µs [537.17 µs, 579.501 µs] 181.849 µs (48.3%)
iast_GLOBAL 507.661 µs [486.735 µs, 528.588 µs] 131.175 µs (34.8%)
iast_HARDCODED_SECRET_DISABLED 480.943 µs [459.76 µs, 502.126 µs] 104.456 µs (27.7%)
iast_INACTIVE 441.893 µs [421.321 µs, 462.464 µs] 65.406 µs (17.4%)
iast_TELEMETRY_OFF 475.558 µs [454.173 µs, 496.944 µs] 99.072 µs (26.3%)
tracing 445.09 µs [424.48 µs, 465.7 µs] 68.603 µs (18.2%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 366.239 µs [346.611 µs, 385.866 µs] -
iast 486.627 µs [465.401 µs, 507.852 µs] 120.388 µs (32.9%)
iast_FULL 561.464 µs [539.813 µs, 583.116 µs] 195.225 µs (53.3%)
iast_GLOBAL 518.5 µs [496.386 µs, 540.615 µs] 152.262 µs (41.6%)
iast_HARDCODED_SECRET_DISABLED 482.718 µs [461.342 µs, 504.093 µs] 116.479 µs (31.8%)
iast_INACTIVE 449.674 µs [428.308 µs, 471.04 µs] 83.436 µs (22.8%)
iast_TELEMETRY_OFF 466.066 µs [445.342 µs, 486.791 µs] 99.827 µs (27.3%)
tracing 441.142 µs [419.97 µs, 462.313 µs] 74.903 µs (20.5%)
Request duration reports for petclinic
gantt
    title petclinic - request duration [CI 0.99] : candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.34 ms) : 1321, 1359
.   : milestone, 1340,
appsec (1.728 ms) : 1706, 1750
.   : milestone, 1728,
appsec_no_iast (1.726 ms) : 1702, 1749
.   : milestone, 1726,
iast (1.472 ms) : 1450, 1495
.   : milestone, 1472,
profiling (1.486 ms) : 1462, 1511
.   : milestone, 1486,
tracing (1.472 ms) : 1447, 1496
.   : milestone, 1472,
section candidate
no_agent (1.332 ms) : 1313, 1351
.   : milestone, 1332,
appsec (1.72 ms) : 1697, 1743
.   : milestone, 1720,
appsec_no_iast (1.71 ms) : 1686, 1733
.   : milestone, 1710,
iast (1.482 ms) : 1460, 1504
.   : milestone, 1482,
profiling (1.474 ms) : 1450, 1498
.   : milestone, 1474,
tracing (1.475 ms) : 1450, 1500
.   : milestone, 1475,
Loading
  • baseline results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.34 ms [1.321 ms, 1.359 ms] -
appsec 1.728 ms [1.706 ms, 1.75 ms] 387.715 µs (28.9%)
appsec_no_iast 1.726 ms [1.702 ms, 1.749 ms] 385.942 µs (28.8%)
iast 1.472 ms [1.45 ms, 1.495 ms] 132.414 µs (9.9%)
profiling 1.486 ms [1.462 ms, 1.511 ms] 146.3 µs (10.9%)
tracing 1.472 ms [1.447 ms, 1.496 ms] 131.497 µs (9.8%)
  • candidate results
Variant Request duration [CI 0.99] Δ no_agent
no_agent 1.332 ms [1.313 ms, 1.351 ms] -
appsec 1.72 ms [1.697 ms, 1.743 ms] 387.769 µs (29.1%)
appsec_no_iast 1.71 ms [1.686 ms, 1.733 ms] 377.543 µs (28.3%)
iast 1.482 ms [1.46 ms, 1.504 ms] 150.056 µs (11.3%)
profiling 1.474 ms [1.45 ms, 1.498 ms] 141.628 µs (10.6%)
tracing 1.475 ms [1.45 ms, 1.5 ms] 143.123 µs (10.7%)

Dacapo

Parameters

Baseline Candidate
baseline_or_candidate baseline candidate
git_branch master dylan/sfn-trace-ctx
git_commit_date 1729717305 1729719446
git_commit_sha 0cf47ed 3039780
release_version 1.42.0-SNAPSHOT~0cf47ed20e 1.42.0-SNAPSHOT~30397802a8
See matching parameters
Baseline Candidate
application biojava biojava
ci_job_date 1729724199 1729724199
ci_job_id 682326178 682326178
ci_pipeline_id 47290650 47290650
cpu_model Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
variant appsec appsec

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 12 metrics, 0 unstable metrics.

Execution time for tomcat
gantt
    title tomcat - execution time [CI 0.99] : candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.469 ms) : 1457, 1480
.   : milestone, 1469,
appsec (2.341 ms) : 2300, 2383
.   : milestone, 2341,
iast (2.079 ms) : 2028, 2130
.   : milestone, 2079,
iast_GLOBAL (2.127 ms) : 2076, 2179
.   : milestone, 2127,
profiling (1.946 ms) : 1903, 1988
.   : milestone, 1946,
tracing (1.931 ms) : 1891, 1971
.   : milestone, 1931,
section candidate
no_agent (1.466 ms) : 1455, 1477
.   : milestone, 1466,
appsec (2.335 ms) : 2294, 2376
.   : milestone, 2335,
iast (2.074 ms) : 2022, 2126
.   : milestone, 2074,
iast_GLOBAL (2.121 ms) : 2069, 2174
.   : milestone, 2121,
profiling (1.943 ms) : 1900, 1985
.   : milestone, 1943,
tracing (1.918 ms) : 1878, 1958
.   : milestone, 1918,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.469 ms [1.457 ms, 1.48 ms] -
appsec 2.341 ms [2.3 ms, 2.383 ms] 872.536 µs (59.4%)
iast 2.079 ms [2.028 ms, 2.13 ms] 610.298 µs (41.6%)
iast_GLOBAL 2.127 ms [2.076 ms, 2.179 ms] 658.574 µs (44.8%)
profiling 1.946 ms [1.903 ms, 1.988 ms] 476.787 µs (32.5%)
tracing 1.931 ms [1.891 ms, 1.971 ms] 462.22 µs (31.5%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 1.466 ms [1.455 ms, 1.477 ms] -
appsec 2.335 ms [2.294 ms, 2.376 ms] 868.879 µs (59.3%)
iast 2.074 ms [2.022 ms, 2.126 ms] 607.739 µs (41.5%)
iast_GLOBAL 2.121 ms [2.069 ms, 2.174 ms] 655.302 µs (44.7%)
profiling 1.943 ms [1.9 ms, 1.985 ms] 476.652 µs (32.5%)
tracing 1.918 ms [1.878 ms, 1.958 ms] 452.016 µs (30.8%)
Execution time for biojava
gantt
    title biojava - execution time [CI 0.99] : candidate=1.42.0-SNAPSHOT~30397802a8, baseline=1.42.0-SNAPSHOT~0cf47ed20e
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.919 s) : 14919000, 14919000
.   : milestone, 14919000,
appsec (15.43 s) : 15430000, 15430000
.   : milestone, 15430000,
iast (18.839 s) : 18839000, 18839000
.   : milestone, 18839000,
iast_GLOBAL (17.952 s) : 17952000, 17952000
.   : milestone, 17952000,
profiling (15.174 s) : 15174000, 15174000
.   : milestone, 15174000,
tracing (15.373 s) : 15373000, 15373000
.   : milestone, 15373000,
section candidate
no_agent (15.099 s) : 15099000, 15099000
.   : milestone, 15099000,
appsec (15.29 s) : 15290000, 15290000
.   : milestone, 15290000,
iast (18.926 s) : 18926000, 18926000
.   : milestone, 18926000,
iast_GLOBAL (18.169 s) : 18169000, 18169000
.   : milestone, 18169000,
profiling (15.184 s) : 15184000, 15184000
.   : milestone, 15184000,
tracing (15.237 s) : 15237000, 15237000
.   : milestone, 15237000,
Loading
  • baseline results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 14.919 s [14.919 s, 14.919 s] -
appsec 15.43 s [15.43 s, 15.43 s] 511.0 ms (3.4%)
iast 18.839 s [18.839 s, 18.839 s] 3.92 s (26.3%)
iast_GLOBAL 17.952 s [17.952 s, 17.952 s] 3.033 s (20.3%)
profiling 15.174 s [15.174 s, 15.174 s] 255.0 ms (1.7%)
tracing 15.373 s [15.373 s, 15.373 s] 454.0 ms (3.0%)
  • candidate results
Variant Execution Time [CI 0.99] Δ no_agent
no_agent 15.099 s [15.099 s, 15.099 s] -
appsec 15.29 s [15.29 s, 15.29 s] 191.0 ms (1.3%)
iast 18.926 s [18.926 s, 18.926 s] 3.827 s (25.3%)
iast_GLOBAL 18.169 s [18.169 s, 18.169 s] 3.07 s (20.3%)
profiling 15.184 s [15.184 s, 15.184 s] 85.0 ms (0.6%)
tracing 15.237 s [15.237 s, 15.237 s] 138.0 ms (0.9%)


public static StringBuilder getModifiedInput(String request, String ddTraceContextJSON) {
StringBuilder modifiedInput = new StringBuilder(request);
int startPos = modifiedInput.indexOf("{");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Code Quality Violation

Suggested change
int startPos = modifiedInput.indexOf("{");
int startPos = modifiedInput.indexOf('{');
Do not use a string with a single letter (...read more)

When using indexOf with only one character, use a character and not a string as it executes faster.

View in Datadog  Leave us feedback  Documentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm StringBuilder.indexOf() does not take a char type

Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of comments about refactoring and testing

// Include httpclient instrumentation for testing because it is a dependency for aws-sdk.
testImplementation project(':dd-java-agent:instrumentation:apache-httpclient-4')
testImplementation project(':dd-java-agent:instrumentation:aws-java-sdk-2.2')
testImplementation 'software.amazon.awssdk:sfn:2.27.2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mismatches muzzle requirements. Is it expected?

Comment on lines 31 to 36
String ddTraceContextJSON = InputAttributeInjector.buildTraceContext(span);
// Inject the trace context into the Step Function input
StringBuilder modifiedInput =
InputAttributeInjector.getModifiedInput(request.input(), ddTraceContextJSON);

return request.toBuilder().input(modifiedInput.toString()).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be deduplicated using a dedicated method:

SdkRequest injectTraceContext(request, span) {
      String traceContext = InputAttributeInjector.buildTraceContext(span);
      // Inject the trace context into the Step Function input
      String modifiedInput = InputAttributeInjector.getModifiedInput(request.input(), traceContext);
      return request.toBuilder().input(modifiedInput).build()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I had to handle both StartExecutionRequest and StartSyncExecutionRequest param types I ended up doing method overloading for injectTraceContext()

}
})

def endPoint = "http://" + LOCALSTACK.getHost() + ":" + LOCALSTACK.getMappedPort(4566)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be define as fixture field to avoid duplication (already computed at setup)

@DylanLovesCoffee
Copy link
Contributor Author

DylanLovesCoffee commented Oct 21, 2024

@PerfectSlayer I think I covered all of the comments/corrections, but for some reason the CI's muzzle check is continuing to fail :/ It seems like it's mostly timing out, so may be unrelated to my changes?

public class InputAttributeInjector {
public static String buildTraceContext(AgentSpan span) {
// Extract span tags
StringBuilder spanTagsJSON = new StringBuilder();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me because it is a potential injection vector. I would prefer the use of a JsonBuffer that handles escaping properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I updated this with the JsonBuffer but when I run muzzle locally it fails with

error: cannot find symbol
import datadog.trace.bootstrap.JsonBuffer;
                              ^
  symbol:   class JsonBuffer
  location: package datadog.trace.bootstrap

Any ideas on if I did something very obviously wrong?

public SfnInterceptor() {}

@Override
public SdkRequest modifyRequest(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably guard against any exceptions escaping from our listeners.
I'd suggest moving the logic into a modifyRequestImpl and then have modifyRequest wrap modifyRequestImpl and catch any exceptions.

Copy link
Contributor

@dougqh dougqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the Json construction done in a safer way.
I also think we should guard against exceptions propagating out of the listener.

@amarziali amarziali removed their request for review November 1, 2024 12:39
spanTagsJSON.beginObject();
span.getTags()
.forEach((tagKey, tagValue) -> spanTagsJSON.name(tagKey).value(tagValue.toString()));
spanTagsJSON.endObject();
Copy link

@avedmala avedmala Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this JSON format for the x-datadog-tags expected? Not urgent but need to know at some point if we need to handle this format or not.

The value from this impl looks like this {"component":"java-aws-sdk","thread":{"name":"main","id":"1"},"env":"dev","span":{"kind":"client"}}

Usually we expect something like _dd.p.dm=-0,_dd.p.tid=658b547000000000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does seem like this was written as a json encoding of the tags from the start of this pr. and reading the other associated PR's, it's not entirely clear if they expect the tags to be json-encoded or comma-separated-key=values ... will dig further.

@PerfectSlayer
Copy link
Contributor

I'm working on #7973 that might help with the JSON part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants