Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open census Span to View converter #1239

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion api/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,8 @@ func NewFromEnv(ctx context.Context, opts ...Option) *Server {
opts = append(opts, WithLogDest(getEnv(EnvLogDest, DefaultLogDest), getEnv(EnvLogPrefix, "")))
opts = append(opts, WithZipkin(getEnv(EnvZipkinURL, "")))
opts = append(opts, WithJaeger(getEnv(EnvJaegerURL, "")))
opts = append(opts, WithPrometheus()) // TODO option to turn this off?
opts = append(opts, WithPrometheus()) // TODO option to turn this off?
opts = append(opts, WithSpanConverter()) // TODO option to turn this off?
opts = append(opts, WithDBURL(getEnv(EnvDBURL, defaultDB)))
opts = append(opts, WithMQURL(getEnv(EnvMQURL, defaultMQ)))
opts = append(opts, WithLogURL(getEnv(EnvLogDBURL, "")))
Expand Down Expand Up @@ -743,6 +744,14 @@ func WithPrometheus() Option {
}
}

func WithSpanConverter() Option {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this take an argument for latency buckets? Not the end of the world, but it would be good to keep buckets consistent with service side.

converter, _ := NewSpanConverter(Options{Namespace: "fn"})
trace.RegisterExporter(converter)
trace.ApplyConfig(trace.Config{DefaultSampler: trace.AlwaysSample()})

return nil
}

// WithoutHTTPTriggerEndpoints optionally disables the trigger and route endpoints from a LB -supporting server, allowing extensions to replace them with their own versions
func WithoutHTTPTriggerEndpoints() Option {
return func(ctx context.Context, s *Server) error {
Expand Down
122 changes: 122 additions & 0 deletions api/server/spanconverter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
package server

import (
"context"
"strings"
"sync"
"time"
"unicode"

"go.opencensus.io/stats"
view "go.opencensus.io/stats/view"
"go.opencensus.io/trace"
)

// SpanConverter registers as a opencensus Trace Exporter,
// but it converts all the Spans in Views and registers them as such
// A View exporter will then export them as normal.
type SpanConverter struct {
opts Options
measures map[string]*stats.Float64Measure
viewsMu sync.Mutex
e view.Exporter
}

// Options contains options for configuring the exporter.
type Options struct {
Namespace string
}

func NewSpanConverter(o Options) (*SpanConverter, error) {
c := &SpanConverter{
opts: o,
measures: make(map[string]*stats.Float64Measure),
}
return c, nil
}

var maxViews = 100

// Spans are rejected if there are already maxViews (100) or they are
// prefixed with '/', gin as been observed creating Span id specific
// named Spans.
func (c *SpanConverter) rejectSpan(sd *trace.SpanData) bool {
return len(c.measures) > maxViews || urlName(sd)
}

// ExportSpan creates a Measure and View once per Span.Name, registering
// the View with the opencensus register. The length of time reported
// by the span is then recorded using the measure.
func (c *SpanConverter) ExportSpan(sd *trace.SpanData) {
if c.rejectSpan(sd) {
return
}
m := c.getMeasure(sd)

spanTimeNanos := sd.EndTime.Sub(sd.StartTime)
spanTimeMillis := float64(int64(spanTimeNanos / time.Millisecond))

stats.Record(context.Background(), m.M(spanTimeMillis))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should rip some of the attributes off the span data as well and put them onto the context. this may be slightly painful, but tags are likely important so that we have context for eg function_id, app_id... these seem to map to the fields SpanData.Attributes and tag.Map, however it's not immediately clear from digging around the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attributes, MessageEvents and Annotations are all empty for our spans.

}

var latencyDist = []float64{1, 10, 50, 100, 250, 500, 1000, 10000, 60000, 120000}

func (c *SpanConverter) getMeasure(span *trace.SpanData) *stats.Float64Measure {
sig := sanitize(span.Name)
c.viewsMu.Lock()
m, ok := c.measures[sig]
c.viewsMu.Unlock()

if !ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concurrent access without lock. Outside of locks this could register/insert more than once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it looked wrong to me to, when I lifted it from the main prom exporter.

In actual fact there is no risk here, only that a view gets created twice, but OC will take care of that with it's deduplication logic.

m = stats.Float64(sig+"_span_time", "The span length in milliseconds", "ms")
v := &view.View{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like the wrong place for view registration. we should maybe/probably stats.Record everything and not limit the stats.Record bit (it's gated by rejectSpan as well as getMeasure and an arbitrary 100 limit imposed). at least, stats.Record and view.Measure are separate processes and should be exclusive. Tolga linked to some pointers for how the other stuff is handled atm, I think initially we're mostly registering views in initialization, we can maybe make this more friendly (ie no reboot required) later but it's relatively nice to work with for the ability to toggle them on/off for eg things that extend fn. then we end up whitelisting things in, which I kind of agree is painful, while at the same time it does make us organize things (like stat/trace names) so it's a net good I think. I think we can remove the limits (100) with that in place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also won't require a lock - we shouldn't be locking for a stats library, that is on them!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything has already been recorded, that's already happened to produce the SpanData. This is copying up the recorded values into the Measure.

Name: sanitize(span.Name),
Description: sanitize(span.Name),
Measure: m,
Aggregation: view.Distribution(latencyDist...),
}

c.viewsMu.Lock()
c.measures[sig] = m
view.Register(v)
c.viewsMu.Unlock()
}

return m
}

const labelKeySizeLimit = 100

// sanitize returns a string that is trunacated to 100 characters if it's too
// long, and replaces non-alphanumeric characters to underscores.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all stuff we control right? we're the ones making the names up? I don't think we need this (and if we do, we should probably be using tags for some of the stuff in the name?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trouble is we don't control them all. See the problem with the Gin generated spans.

func sanitize(s string) string {
if len(s) == 0 {
return s
}
if len(s) > labelKeySizeLimit {
s = s[:labelKeySizeLimit]
}
s = strings.Map(sanitizeRune, s)
if unicode.IsDigit(rune(s[0])) {
s = "key_" + s
}
if s[0] == '_' {
s = "key" + s
}
return s
}

// converts anything that is not a letter or digit to an underscore
func sanitizeRune(r rune) rune {
if unicode.IsLetter(r) || unicode.IsDigit(r) {
return r
}
// Everything else turns into an underscore
return '_'
}

// Gin creates spans for all paths, containing ID values.
// We can safely discard these, as other histograms are being created for them.
func urlName(sd *trace.SpanData) bool {
return strings.HasPrefix(sd.Name, "/")
}