Renovate Streaming Support #5910

yurishkuro · 2024-08-31T16:21:18Z

Summary

Bring streaming analytics support directly into Jaeger backend, instead of requiring separate Spark/Flink data pipelines.

Background

One of the challenges of distributed tracing is that spans can arrive from all kinds of places in the architecture at different times. If your only job is to store them (which is what Jaeger collector does primarily) then it's not a big problem, since the storage backends take care of partitioning and indexing the spans by trace-id. But the most interesting applications of traces require looking at a whole trace in one place to make decisions based on the overall call graph, not on individual spans.

Data Streaming is great at doing that. Historically Jaeger supported a couple of Java-based data pipelines (for basic dependency graph and for transitive dependency graph), which were implemented independently on top of Spark and Flink frameworks. There were problems with that approach:

The business logic had to be written in Java, meaning we could not reuse all the domain model capabilities we had in the primary Go code
We had to duplicate some of the logic, e.g. the all-in-one supported constructing a dependency graph on the fly, which was implemented completely independently from the Java Spark job.
The https://github.com/jaegertracing/spark-dependencies and https://github.com/jaegertracing/jaeger-analytics-flink repos had seen very little changes, the latter doesn't even have a production-grade way of running it

Proposal

We should bring streaming capabilities into the main Jaeger repo using Go code. This will address many of the problems mentioned above. The main challenge with data streaming is that it is a stateful activity, which requires checkpointing capabilities to avoid data loss and inconsistent results when Jaeger instances are restarted. This is where the well known streaming frameworks like Spark and Flink come in - they provide the needed orchestration and statefulness. In the past we could not use them with Go, but today there are projects like Apache Beam that provide a unified programming model via well supported SDK (including Go) that allows implementing the pipeline logic in Go and executing it on a number of runtimes

The text was updated successfully, but these errors were encountered:

dosubot bot added the changelog:new-feature Change that should be called out as new feature in CHANGELOG label Aug 31, 2024

yurishkuro mentioned this issue Aug 31, 2024

Implement in-memory Service Dependency Graph using Apache Beam #5911

Open

4 tasks

yurishkuro moved this to Todo in Roadmap Nov 14, 2024

yurishkuro added this to Roadmap Nov 14, 2024

yurishkuro changed the title ~~[RFC] Renovate Streaming Support~~ Renovate Streaming Support Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Renovate Streaming Support #5910

Renovate Streaming Support #5910

yurishkuro commented Aug 31, 2024 •

edited

Loading

Renovate Streaming Support #5910

Renovate Streaming Support #5910

Comments

yurishkuro commented Aug 31, 2024 • edited Loading

Summary

Background

Proposal

yurishkuro commented Aug 31, 2024 •

edited

Loading