Skip to content

Commit

Permalink
Merge pull request #71 from cactusdynamics/interned-string
Browse files Browse the repository at this point in the history
Intern event names and categories to save space in the trace output
  • Loading branch information
shuhaowu authored Mar 21, 2024
2 parents 5218cb1 + 2ec2c23 commit 3edcdcc
Show file tree
Hide file tree
Showing 34 changed files with 1,161 additions and 139 deletions.
2 changes: 1 addition & 1 deletion .clang-format
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
Language: Cpp
Standard: c++17
BasedOnStyle: Google
IndentWidth: 2
ColumnLimit: 0
PointerAlignment: Left
AlignConsecutiveDeclarations: true
ContinuationIndentWidth: 2
AlignAfterOpenBracket: BlockIndent

4 changes: 3 additions & 1 deletion .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ cert-*,
-bugprone-implicit-widening-of-multiplication-result,
-bugprone-narrowing-conversions,
-clang-diagnostic-unused-parameter,
-misc-include-cleaner,
-misc-non-private-member-variables-in-classes,
-modernize-pass-by-value,
-modernize-use-nodiscard,
Expand All @@ -19,7 +20,8 @@ cert-*,
-readability-function-cognitive-complexity,
-readability-identifier-length,
-readability-isolate-declaration,
-readability-magic-numbers'
-readability-magic-numbers,
-readability-redundant-inline-specifier'
# TODO: Re-enable bugprone-exception-escape when no longer throwing
# https://github.com/isocpp/CppCoreGuidelines/issues/1589
WarningsAsErrors: '*'
Expand Down
3 changes: 2 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -135,10 +135,11 @@ target_compile_definitions(cactus_rt PUBLIC QUILL_USE_BOUNDED_QUEUE)
if (ENABLE_TRACING)
target_sources(cactus_rt
PRIVATE
src/cactus_rt/tracing/tracing_enabled.cc
src/cactus_rt/tracing/sink.cc
src/cactus_rt/tracing/thread_tracer.cc
src/cactus_rt/tracing/trace_aggregator.cc
src/cactus_rt/tracing/tracing_enabled.cc
src/cactus_rt/tracing/utils/string_interner.cc
)

target_link_libraries(cactus_rt
Expand Down
9 changes: 7 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,16 @@ test-debug: build-test-debug

test: test-debug

benchmark: build-test-debug
build/test/tests/cactus_rt_tracing_benchmark
build-test-release:
cmake -Bbuild/benchmark -DCMAKE_BUILD_TYPE=RelWithDebInfo -DENABLE_EXAMPLES=OFF -DBUILD_TESTING=ON -DENABLE_CLANG_TIDY=$(ENABLE_CLANG_TIDY) -DENABLE_TRACING=$(ENABLE_TRACING)
cmake --build build/benchmark -j $$(nproc)

benchmark: build-test-release
build/benchmark/tests/cactus_rt_tracing_benchmark

clean:
test ! -d build/test || cmake --build build/test --target clean
test ! -d build/benchmark || cmake --build build/benchmark --target clean
test ! -d build/debug || cmake --build build/debug --target clean
test ! -d build/release || cmake --build build/release --target clean

Expand Down
3 changes: 2 additions & 1 deletion docker/scripts/01-build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ cmake -B${CACTUS_RT_BUILD_DIR} \
-DENABLE_CLANG_TIDY=ON \
-DBUILD_DOCS=ON \
-DBUILD_TESTING=ON \
-DENABLE_TRACING=${ENABLE_TRACING:-ON}
-DENABLE_TRACING=${ENABLE_TRACING:-ON} \
-DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build ${CACTUS_RT_BUILD_DIR} -j $(nproc)
52 changes: 51 additions & 1 deletion docs/tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,57 @@ TODO...

### Performance and limitations

TODO...
#### Too much data causing buffer overflow and trace data drops

#### String interning

Perfetto allows event names and event categories to be interned by assigning
each unique string with an id and emitting the id rather than the string to save
space in the output trace files. This feature is implemented via the
`interned_data`, `name_iid`, and `category_iids` fields. A few details worth
noting:

1. Perfetto uses `trusted_packet_sequence_id` to identify a single sequence. A
single sequence must have monotonically increasing timestamps for its
packets. Both Perfetto upstream and cactus-rt's implementation of the tracer
gives a sequence id per thread. In cactus-rt, this is especially important
as when we pop the data from the queue originating from multiple threads.
2. The `iid`s on their own are not sufficient to identify the string being
interned as it is not "globally" unique. It is only unique per sequence
(i.e. thus `(trusted_packet_sequence_id, iid)` is sufficient to identify an
interned string). This, along with (1), implies we have to intern strings on
a per-thread interner.

### Other notes

#### Perfetto SDK (upstream) can grab a lock when buffer is full

- It seems that trace events are emitted via the `TRACE_EVENT` macro, which starts [here with `TRACE_EVENT_BEGIN`](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/track_event.h#L354).
- This then calls [`PERFETTO_INTERNAL_TRACK_EVENT_WITH_METHOD`](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/internal/track_event_macros.h#L115).
- This also calls `perfetto_track_event::internal::isDynamicCategory`
- This calls [`perfetto_track_event::TrackEvent::TraceForCategory`](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/internal/track_event_data_source.h#L376), which eventually calls:
- [`WriteTrackEventImpl`](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/internal/track_event_data_source.h#L874) which creates the protobuf packet via protozero.
- This [calls](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/internal/track_event_data_source.h#L901) [`TrackEventInternal::WriteEvent`](https://github.com/google/perfetto/blob/v43.2/src/tracing/internal/track_event_internal.cc#L528)
- which [calls](https://github.com/google/perfetto/blob/v43.2/src/tracing/internal/track_event_internal.cc#L537) [`NewTracePacket`](https://github.com/google/perfetto/blob/v43.2/src/tracing/internal/track_event_internal.cc#L470)
- which [calls](https://github.com/google/perfetto/blob/v43.2/src/tracing/internal/track_event_internal.cc#L479) [`TraceWriterBase::NewTracePacket`](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/trace_writer_base.h#L41-L61), **which might be [`TraceWriterImpl::NewTracePacket`](https://github.com/google/perfetto/blob/v43.2/src/tracing/core/trace_writer_impl.cc#L119).**

From the other side:

- `SharedMemoryArbiterImpl::GetNewChunk` grabs [a lock](https://github.com/google/perfetto/blob/v43.2/src/tracing/core/shared_memory_arbiter_impl.cc#L106-L110)
- [called from](https://github.com/google/perfetto/blob/v43.2/src/tracing/core/trace_writer_impl.cc#L242) `TraceWriterImpl::GetNewBuffer`.
- **This is called from [`TraceWriterImpl::NewTracePacket`](https://github.com/google/perfetto/blob/v43.2/src/tracing/core/trace_writer_impl.cc#L151).** This is called if the chunk is too full for the data.

This means that there is a small chance that a lock can be taken, and an even smaller chance that a priority inversion could take place. This means it is not suitable for hard real-time applications and a custom tracing library must be written.

#### Perfetto SDK string interning

- `WriteTrackEventImpl` [calls](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/internal/track_event_data_source.h#L907) `TrackEventInternal::WriteEventName` which calls [`InternedEventName::Get`](https://github.com/google/perfetto/blob/v43.2/src/tracing/internal/track_event_internal.cc#L506), and which is implemented in [here](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/track_event_interned_data_index.h#L183).
- In here there's memory allocation due to [`LookupAndInsert`](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/track_event_interned_data_index.h#L99-L144) (called from [here](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/track_event_interned_data_index.h#L189)) using a `std::map`.
- Also `GetOrCreateIndex` [calls `new`](https://github.com/google/perfetto/blob/v43.2/include/perfetto/tracing/track_event_interned_data_index.h#L243). Likely this means another allocation potential on the hot path.

This means Perfetto uses memory allocation on the hot path, which is not acceptable for cactus-rt due to real-time constraints.

#### Perfetto SDK data dropping counter

Advanced tuning and use cases
-----------------------------
Expand Down
52 changes: 40 additions & 12 deletions examples/tracing_example/main.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,19 @@ void WasteTime(std::chrono::microseconds duration) {
class ExampleRTThread : public CyclicThread {
int64_t loop_counter_ = 0;

static cactus_rt::CyclicThreadConfig CreateThreadConfig() {
cactus_rt::CyclicThreadConfig thread_config;
thread_config.period_ns = 1'000'000;
thread_config.cpu_affinity = std::vector<size_t>{2};
thread_config.SetFifoScheduler(80);

// thread_config.tracer_config.trace_sleep = true;
thread_config.tracer_config.trace_wakeup_latency = true;
return thread_config;
}

public:
ExampleRTThread(const char* name, cactus_rt::CyclicThreadConfig config) : CyclicThread(name, config) {}
ExampleRTThread() : CyclicThread("ExampleRTThread", CreateThreadConfig()) {}

int64_t GetLoopCounter() const {
return loop_counter_;
Expand Down Expand Up @@ -60,21 +71,38 @@ class ExampleRTThread : public CyclicThread {
}
};

int main() {
cactus_rt::CyclicThreadConfig thread_config;
thread_config.period_ns = 1'000'000;
thread_config.cpu_affinity = std::vector<size_t>{2};
thread_config.SetFifoScheduler(80);
class SecondRTThread : public CyclicThread {
static cactus_rt::CyclicThreadConfig CreateThreadConfig() {
cactus_rt::CyclicThreadConfig thread_config;
thread_config.period_ns = 3'000'000;
thread_config.cpu_affinity = {1};
thread_config.SetFifoScheduler(60);

// thread_config.tracer_config.trace_sleep = true;
thread_config.tracer_config.trace_wakeup_latency = true;
// thread_config.tracer_config.trace_sleep = true;
thread_config.tracer_config.trace_wakeup_latency = true;
return thread_config;
}

public:
SecondRTThread() : CyclicThread("SecondRTThread", CreateThreadConfig()) {}

protected:
bool Loop(int64_t /*now*/) noexcept final {
const auto span = Tracer().WithSpan("Sense");
WasteTime(std::chrono::microseconds(2000));
return false;
}
};

int main() {
cactus_rt::AppConfig app_config;
app_config.tracer_config.trace_aggregator_cpu_affinity = {1};
app_config.tracer_config.trace_aggregator_cpu_affinity = {0}; // doesn't work yet

auto thread = std::make_shared<ExampleRTThread>("ExampleRTThread", thread_config);
auto thread1 = std::make_shared<ExampleRTThread>();
auto thread2 = std::make_shared<SecondRTThread>();
App app("TracingExampleApp", app_config);
app.RegisterThread(thread);
app.RegisterThread(thread1);
app.RegisterThread(thread2);

std::cout << "Testing RT loop for 15 seconds with two trace sessions.\n";

Expand All @@ -97,6 +125,6 @@ int main() {
app.Join();
// Don't need to stop the trace session as the app destructor will take care of it.

std::cout << "Number of loops executed: " << thread->GetLoopCounter() << "\n";
std::cout << "Number of loops executed: " << thread1->GetLoopCounter() << "\n";
return 0;
}
45 changes: 41 additions & 4 deletions examples/tracing_protos_example/synthetic_events.cc
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@

#include "trace.pb.h"

using cactus_tracing::vendor::perfetto::protos::InternedData;
using cactus_tracing::vendor::perfetto::protos::ProcessDescriptor;
using cactus_tracing::vendor::perfetto::protos::ThreadDescriptor;
using cactus_tracing::vendor::perfetto::protos::Trace;
using cactus_tracing::vendor::perfetto::protos::TracePacket_SequenceFlags_SEQ_INCREMENTAL_STATE_CLEARED;
using cactus_tracing::vendor::perfetto::protos::TracePacket_SequenceFlags_SEQ_NEEDS_INCREMENTAL_STATE;
using cactus_tracing::vendor::perfetto::protos::TrackDescriptor;
using cactus_tracing::vendor::perfetto::protos::TrackEvent;
using cactus_tracing::vendor::perfetto::protos::TrackEvent_Type_TYPE_INSTANT;
Expand Down Expand Up @@ -69,13 +72,22 @@ int main() {
auto* packet3 = trace3.add_packet();
packet3->set_timestamp(200);

auto* interned_data1 = new InternedData();
auto* event_name = interned_data1->add_event_names();
event_name->set_iid(1);
event_name->set_name("My special parent");
packet3->set_allocated_interned_data(interned_data1);

auto* track_event1 = new TrackEvent();
track_event1->set_type(TrackEvent_Type_TYPE_SLICE_BEGIN);
track_event1->set_track_uuid(thread_uuid);
track_event1->set_name("My special parent");
track_event1->set_name_iid(1);
packet3->set_allocated_track_event(track_event1);

packet3->set_trusted_packet_sequence_id(trusted_packet_sequence_id);
packet3->set_previous_packet_dropped(true);
packet3->set_first_packet_on_sequence(true);
packet3->set_sequence_flags(TracePacket_SequenceFlags_SEQ_INCREMENTAL_STATE_CLEARED | TracePacket_SequenceFlags_SEQ_NEEDS_INCREMENTAL_STATE);

Trace trace4;
auto* packet4 = trace4.add_packet();
Expand Down Expand Up @@ -123,19 +135,44 @@ int main() {

packet7->set_trusted_packet_sequence_id(trusted_packet_sequence_id);

// Packets complete, write it into a file!
Trace trace8;
auto* packet8 = trace8.add_packet();
packet8->set_timestamp(350);

auto* track_event8 = new TrackEvent();
track_event8->set_type(TrackEvent_Type_TYPE_SLICE_BEGIN);
track_event8->set_track_uuid(thread_uuid);
track_event8->set_name_iid(1);
packet8->set_allocated_track_event(track_event8);
packet8->set_sequence_flags(TracePacket_SequenceFlags_SEQ_NEEDS_INCREMENTAL_STATE);

packet8->set_trusted_packet_sequence_id(trusted_packet_sequence_id);

Trace trace9;
auto* packet9 = trace9.add_packet();
packet9->set_timestamp(500);

auto* track_event9 = new TrackEvent();
track_event9->set_type(TrackEvent_Type_TYPE_SLICE_END);
track_event9->set_track_uuid(thread_uuid);
packet9->set_allocated_track_event(track_event9);

packet9->set_trusted_packet_sequence_id(trusted_packet_sequence_id);

{
std::fstream output("build/direct_proto_serialization.perfetto-trace", std::ios::out | std::ios::trunc | std::ios::binary);

const std::array<Trace*, 7> traces{
const std::array<Trace*, 9> traces{
&trace1,
&trace2,
&trace3,
&trace4,
&trace5,
&trace6,
&trace7};
&trace7,
&trace8,
&trace9,
};

for (const auto* trace : traces) {
trace->SerializeToOstream(&output);
Expand Down
11 changes: 5 additions & 6 deletions include/cactus_rt/app.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ class App {
// TODO: investigate into a weak pointer.
std::list<std::shared_ptr<tracing::ThreadTracer>> thread_tracers_;
std::unique_ptr<tracing::TraceAggregator> trace_aggregator_ = nullptr;
std::mutex tracer_mutex_;
std::mutex aggregator_mutex_;

void SetDefaultLogFormat(quill::Config& cfg) {
// Create a handler of stdout
Expand Down Expand Up @@ -113,15 +113,14 @@ class App {
/**
* @brief Starts a new tracing session for the process. Will not start a new
* tracing session if an existing tracing session is in progress. This
* function is not real-time safe. This will not register any output sinks.
* Use App::RegisterTraceSink() to register custom sinks.
* function is not real-time safe.
*/
bool StartTraceSession() noexcept;
bool StartTraceSession(std::shared_ptr<tracing::Sink> sink) noexcept;

/**
* @brief Register a custom trace sink after starting the trace session
*/
void RegisterTraceSink(std::shared_ptr<cactus_rt::tracing::Sink> sink) noexcept;
void RegisterTraceSink(std::shared_ptr<tracing::Sink> sink) noexcept;

/**
* @brief Stops the tracing session for the process. Will be no-op if tracing
Expand Down Expand Up @@ -159,7 +158,7 @@ class App {
*/
void DeregisterThreadTracer(const std::shared_ptr<tracing::ThreadTracer>& thread_tracer) noexcept;

void CreateAndStartTraceAggregator() noexcept;
void CreateAndStartTraceAggregator(std::shared_ptr<tracing::Sink> sink) noexcept;

void StopTraceAggregator() noexcept;
};
Expand Down
2 changes: 1 addition & 1 deletion include/cactus_rt/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ struct ThreadTracerConfig {
*/
struct ThreadConfig {
// A vector of CPUs this thread should run on. If empty, no CPU restrictions are set.
std::vector<size_t> cpu_affinity = {};
std::vector<size_t> cpu_affinity;

// The size of the stack for this thread. Defaults to 8MB.
size_t stack_size = 8 * 1024 * 1024;
Expand Down
8 changes: 8 additions & 0 deletions include/cactus_rt/tracing/thread_tracer.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

#include "../experimental/lockless/atomic_message.h"
#include "track_event_internal.h"
#include "utils/string_interner.h"

namespace cactus_rt::tracing {
struct EventCountData {
Expand All @@ -35,6 +36,13 @@ class ThreadTracer {

moodycamel::ReaderWriterQueue<TrackEventInternal> queue_;

// The event name interning must be done per thread (per sequence). Thus it is
// stored here. However, this class must NEVER call functions here (other
// than maybe .Size), as the memory allocation can occur. This variable is
// designed to be used by TraceAggregator on the non-real-time path.
utils::StringInterner event_name_interner_;
utils::StringInterner event_category_interner_;

std::string name_;
uint64_t track_uuid_;

Expand Down
Loading

0 comments on commit 3edcdcc

Please sign in to comment.