Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry Support #101

Merged
merged 167 commits into from
Mar 29, 2022

Conversation

jfhbrook-at-work
Copy link
Collaborator

@jfhbrook-at-work jfhbrook-at-work commented Jan 10, 2022

Background

This PR adds support for OpenTelemetry tracing with Honeycomb. For more background, see https://github.com/eaze/otel/issues/1 . Closes #100. Closes #95.

Overview

There are a lot of changes in this branch! Most of them are OpenTelemetry related, but a few of them were quality-of-life improvements I made along the way. In particular, it closes #95. I'll point out the others in individual comments.

I would start in templates/boltzmann/core/honeycomb.ts, and reference templates/boltzmann/core/prelude.ts and templates/boltzmann/index.tera to see how it fits together with them. Then, check out templates/boltzmann/core/entrypoint.ts to see how it gets booted. It should read more or less in order, minus some subclassing you may want to check out on a second pass.

That'll give you the basics of the startup procedure. The next big pieces are gonna be templates/boltzmann/core/middleware.ts and templates/boltzmann/middleware/honeycomb.ts. Those files are the ones responsible for creating the automatic boltzmann spans. In general I just have two implementations of everything - this touches beeline code as little as possible, in return for a little dampness. I think this is better than a bespoke facade people will want to rip out later, and adding a beeline compatibility layer inside the otel abstractions themselves would be both obnoxious and more likely to break beeline.

middleware.ts in particular contains the major tracing test - it creates a honeycomb otel mock, sets up a test app, simulates a request with shot and a custom root span, and asserts that the expected traces get emitted.

Logging: OpenTelemetry ships with a logger, accessible under the otel.diag API - and we use it. That said, there's two major complications:

  1. We want to be quiet by default - misconfigured tracing shouldn't flood the logs in most cases
  2. Honeycomb gets initialized so early in the lifecycle that bole isn't guaranteed to be configured yet

So logging looks like this:

  1. If boltzmann's normal LOG_LEVEL is set to debug, we configure an otel logger
  2. Logging is disabled in self-test mode. Note that (as far as I can tell) boltzmann doesn't apply any templating before running the self-tests, so we can't use {% if not selftest %} blocks nor {% else %} statements. This is why it does the weird let trick.
  3. When LOG_LEVEL is set, we default otel logs to info. Most logging we do is at the verbose log level, which is even lower than debug. This separate log level may be configured with the standard OTEL_LOG_LEVEL env variable.
  4. Logging has a fallback for if the honeycomb logger isn't defined, using JSON.stringify. A bole logger is then attached in templates/boltzmann/middleware/log.ts.

The most interesting set of logging hooks is going to be in the HoneycombTraceExporter class in core/honeycomb.ts. Those log when spans are being sent/received.

This should leave a couple of bin script and package.json tweaks, updates to dependencies, and some light edits to type stubs, github action templates and docs.

Current State

All the unit tests are passing and I've created both beeline and otel traces in my test cluster, using our refinery. All of the PRs for updating the services are up in draft mode, with their boltzmanns generated against this branch. Everything seems to behave as-expected when I run our CI tests! I think we're good to go.

Copy link
Collaborator Author

@jfhbrook-at-work jfhbrook-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and gave a self-review. The major gap I see is connecting the parent trace/state headers to the context - I'll make that change shortly.

templates/boltzmann/core/entrypoint.ts Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@chrisdickinson chrisdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! I think y'all have been talking through changing the process.env.HONEYCOMB_API_HOST check; assuming the tests pass & a manual re-scaffold of echolibrium produces the expected spans, I'd say we're pretty much good to go!

templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
Copy link

@ericsampson ericsampson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention two things in my first review:

  1. We'll need to export the OTel context and trace objects so that callers can use them to do things like add context to the automatically generated current span, start explicit spans if desired, etc
    We might as well export the beeline object for the HC case, while we're at it.
    This work could be in a followup PR if you want to test the current functionality first.

  2. Does the Node autoinstrumentation package that we're pulling in automatically add the trace* headers to outgoing HTTP calls made by the app?
    second, does that work if the app author decides to use undici (my memeory is that it isn't autoinstrumented yet), if so we should document how to do manual traceheader propogation in an example or something for undici users.

Cheers!

@jfhbrook-at-work
Copy link
Collaborator Author

We'll need to export the OTel context and trace objects

Yeah, we can do that. In my head I expected people to import the otel abstractions directly, but I don't see a reason why we shouldn't export them. Similarly with the beelines object.

Does the Node autoinstrumentation package that we're pulling in automatically add the trace* headers to outgoing HTTP calls made by the app?

I honestly don't know. I'll dig into both what the node auto-instrumentation actually gets us and the W3CTraceContextPropagator protip.

does that work if the app author decides to use undici

I actually knew nothing about undici until just now, and now I know what's on the npm page. 😆 My guess is that you're right about the auto-instrumentation, but I'll have to take a closer look at that and get back to you.

README.md Show resolved Hide resolved
bin/checkts.sh Show resolved Hide resolved
src/dependencies.ron Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
templates/boltzmann/core/prelude.ts Outdated Show resolved Hide resolved
templates/boltzmann/core/utils/index.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Outdated Show resolved Hide resolved
templates/boltzmann/middleware/jwt.ts Show resolved Hide resolved
let sampler: Sampler = new AlwaysOnSampler()

if (sampleRate === 0) {
sampler = new AlwaysOffSampler()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I'm confused. The sample rate would be # of traces recorded by honeycomb / total number of traces right? So a sample rate of 1.0 would be always on and a sample rate of 0.0 would be always off. Is the convention backwards from how I've been reading it?

@ericsampson
Copy link

Josh, one question that I can't track down among all the code: are we exporting the OTel Trace and Context objects so that consumers can call methods on them?

@jfhbrook-at-work
Copy link
Collaborator Author

Josh, one question that I can't track down among all the code: are we exporting the OTel Trace and Context objects so that consumers can call methods on them?

This is a good question, and there's actually a lot of ~additional context~ to share. tl;dr though: I ultimately intend to expose the relevant parts of the otel API from boltzmann - including trace and context - but need to do some import/export buttoning up either way.


As for the longer conversation: as far as I can tell, anything imported into boltzmann is also exported. This is partially a limitation of the build scripts (afaict) but has the added bonus of you being able to access any imported dependency, as-imported. In theory you can import 'beelines' from it today, and it's appropriate to do the same with otel. But there's a lot of otel stuff across a lot of namespaces and I don't know how much I want to pollute boltzmann's with otel noise.

All this stuff is complicated by how I moved honeycomb stuff into a new file. prelude.ts is the file where all dependencies are imported (and subsequently exported) in the production build. Other typescript files by contrast only import stuff in self-test mode. But the honeycomb code was getting so bulky that it was overwhelming the prelude completely, and the most straightforward way to pull it out was to make it a "pre-prelude". What this means is that honeycomb/otel specific imports in honeycomb.ts follow the prelude's rules and anything else may only be imported in self-test mode. This is all pretty brittle by default though, and it creates a really busy namespace.

Anyway: Right now honeycomb.ts simply imports what it needs and then exports it to stick with conventions and avoid collisions with other boltzmann objects. That's working reasonably well, minus a handful of things I have to button up. But the biggest of those is how to make the most use out of the namespace given to us. My best plan is to move towards importing * and giving the result a clearly module-related name:

import * as otelAPI from '@opentelemetry/api';

The boltzmann namespace will still be a little busy and some of the exports will have clunky names. But in this example you would get you the whole API, including trace and context, in a predictable location. I think that's overall a win. But let me know if this makes sense to you!

@ericsampson
Copy link

I think that makes sense!! But I'm so so far from knowledgeable in Node, it would probably be worth double checking with Chris to get the opinion of someone who knows what the hell they're talking about 😅

Copy link
Collaborator Author

@jfhbrook-at-work jfhbrook-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is close. I found a few things that I want to patch - some of which I'd just forgotten to push - but otherwise this is damned near what I want to go with!

"githubci": false,
"ping": true,
"status": true
"status": true,
"debug": false
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll regenerate all these examples prior to release. This debug here should go away - it was a bug that I've since licked.

@@ -338,13 +450,13 @@
),
DependencySpec(
name: "@typescript-eslint/parser",
version: "^4.11.0",
version: "^5.12.1",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of our apps was extremely grouchy at me using the older versions. I naively upgraded them here and the app became happy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense!

@@ -0,0 +1,20 @@
declare module 'honeycomb-beeline' {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example build was really grouchy about this type stub being missing. I added it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if we should provide it automatically as part of the typescript scaffold?

edit: ah, I see, it is scaffolded in dirspec.ron, but somehow it ended up missing from this example at some point in the past!

@@ -172,10 +172,11 @@ r#"declare module 'honeycomb-beeline' {
finishTrace: (_: string) => void
bindFunctionToTrace<T extends Fn>(...args: Parameters<T>): ReturnType<T>
addContext: (opts?: Record<string, unknown>) => void
addTraceContext: (opts?: Record<string, unknown>) => void
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps surprising, but this was missing from the type stubs despite being a very common method for our apps to be calling! In practice we're moving away from using this method, but it's here now.

declare const beeline: Beeline;
export = beeline;
declare const beeline: Beeline
export = beeline
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was causing a linting error in one of the apps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semicolons: the big bug scourge of the javascript

@@ -127,8 +130,20 @@ class Context {
/**{{- tsdoc(page="02-handlers.md", section="traceURL") -}}*/
get traceURL () {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't actually figured out how to get boltzmann to show me this URL. <_<

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

console.log(context.traceURL) in a handler will do it!

templates/boltzmann/middleware/honeycomb.ts Show resolved Hide resolved
* trace with opentelemetry
* ╱╱┏┳┓╭╮┏┳┓ ╲╲
* ▔▏┗┻┛┃┃┗┻┛▕▔
*/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLEASE let me keep this comment lol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course!

void `{% if honeycomb %}`
honeycomb.logger = bole('boltzmann:honeycomb')
if (honeycomb.features.beeline) {
honeycomb.logger.warn(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm split on this. On one hand I'd like us to get off beelines as soon as it makes sense, but on the other Honeycomb itself has not deprecated beeline, merely placing it in "maintenance mode". What do people think?

@@ -25,10 +25,10 @@ jobs:
{% endif %}
steps:
- uses: actions/checkout@v2
- name: Use Node.js 12.x
uses: actions/setup-node@v1
- name: Use Node.js 16.x
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of the apps had been rolled to v16 in their test actions, so I made the change here as well. v12 is old!

@jfhbrook-at-work jfhbrook-at-work marked this pull request as ready for review March 7, 2022 19:19
Copy link

@ericsampson ericsampson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey Josh, if we aren't setting spanKind: "Server" where appropriate, or "Internal" for the middleware appropriate, then we should add that. It's important for Honeycomb metrics and UI.

See here: https://opentelemetry.io/docs/instrumentation/js/api/tracing/#span-kind

@jfhbrook-at-work
Copy link
Collaborator Author

We should be setting the span kind throughout, but if you spot something I missed definitely point it out!

Copy link
Contributor

@chrisdickinson chrisdickinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHEW WHAT A DOOZY

great work, josh!

I have some comments which I will summarize after posting the review!

.github/workflows/release.yml Show resolved Hide resolved
.github/workflows/test.yml Show resolved Hide resolved
bin/checkts.sh Show resolved Hide resolved
docs/content/reference/03-middleware.md Show resolved Hide resolved
@@ -0,0 +1,20 @@
declare module 'honeycomb-beeline' {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I wonder if we should provide it automatically as part of the typescript scaffold?

edit: ah, I see, it is scaffolded in dirspec.ron, but somehow it ended up missing from this example at some point in the past!

templates/boltzmann/middleware/honeycomb.ts Show resolved Hide resolved
templates/boltzmann/middleware/honeycomb.ts Show resolved Hide resolved
* trace with opentelemetry
* ╱╱┏┳┓╭╮┏┳┓ ╲╲
* ▔▏┗┻┛┃┃┗┻┛▕▔
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course!

<string>handler.version
)

Object.entries(context.params).forEach(([key, value]) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 I think span has a setAttributes method that we can use to set the attributes in bulk.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, I wrote this line before noticing and it looks like I forgot to clean this up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked closer at this, and yes that's true, however I'm actually doing a map across the keys on L238 (paramSpanAttribute(key)). I could do an Object.fromEntries and pass that to span.setAttributes but I don't feel like it would make this more clear. But maybe I'm missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is ready to resolve after my investigation ☝️ but I'll keep it open so we can ensure consensus on this one.

templates/boltzmann/middleware/ping.ts Show resolved Hide resolved
@chrisdickinson
Copy link
Contributor

To recap my comments:

  • (many positive noises)
  • We should add some docs on the volta flag to docs/content/reference/01-cli.md
  • 🍏 There was a comment I found confusing about injecting the service name – it looks like we set it later in the same function.
  • 💡 We might use span.setAttributes instead of looping over Object.entries.
  • Exiting on unhandledRejection is a change (which I approve of!) but I'd like to make sure we also exit on unhandledRejection when the honeycomb flag is disabled.
  • We should chat about how we want to expose those Honeycomb factory overrides to end users. There's some prior art with how we configure, e.g., /monitor/status healthchecks, but I'd like to chat more about the use cases first.

@jfhbrook-at-work jfhbrook-at-work merged commit 9ae7c0c into entropic-dev:latest Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OpenTelemetry support for honeycomb
3 participants