-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing example crashes with bad alloc due to small stack size #67
Comments
Thanks for the detailed report. We'll take a look once we get a bit of time. |
What platform are you trying this on? Arm64 or x86? What's the procedure you used to reproduce this? I can't seem to reproduce this problem on my setup. IIRC the trace data is not stored on the stack, so I'm not sure why increasing the stack size would help you. |
I am running on Linux 20.04, kernel version 5.15.129-rt67, with x86 architecture. To reproduce I clone the project, build it in Release and then run the tracing example: make release
./build/release/examples/tracing_example/rt_tracing_example The program crashes after 5 seconds, which is when the first trace is written to a file. When I apply the "fix" I mentioned above, I now get the This is interesting, and indeed shows that the increased stack size does not fix the problem. I have updated the original question to reflect this. I will try to do some more testing on my end. If you want me to run specific tests or benchmarks, let me know. |
I also repeated this on another machine running Linux 20.04, kernel version 5.15.137-rt71, with x86 architecture. On this machine, leaving the stack size unchanged resulted in a crash after 5 seconds. Adjusting the stack size let the program finish without problems. Could the kernel version be the culprit? I will try this on a third machine running another version of PREEMPT_RT next week. |
Interesting. I don't have a 20.04 and 5.15 kernel to test. If you can get a core dump that might also be helpful. |
Coredump of tracing example built in Debug running on my 20.04 kernel 5.15.137-rt71 machine: git clone https://github.com/cactusdynamics/cactus-rt.git
cd cactus-rt
make debug
./build/debug/examples/tracing_example/rt_tracing_example Coredump
|
I just noticed that you merged #70. The coredump in my comment above was created before I pulled these changes (i.e. cce7512). The same bad alloc error occurs after pulling the newest changes. Here is a coredump when running the latest changes. Coredump``` jelle@jelle-laptop-ubuntu:~/temp/cactus-rt$ coredumpctl info PID: 12625 (rt_tracing_exam) UID: 1000 (jelle) GID: 1000 (jelle) Signal: 6 (ABRT) Timestamp: Mon 2024-03-11 10:16:21 CET (27s ago) Command Line: ./build/debug/examples/tracing_example/rt_tracing_example Executable: /home/jelle/temp/cactus-rt/build/debug/examples/tracing_example/rt_tracing_example Control Group: /user.slice/user-1000.slice/[email protected]/vte-spawn-0ed22e58-60c8-44a3-8457-09a6d754ca33.scope Unit: [email protected] User Unit: vte-spawn-0ed22e58-60c8-44a3-8457-09a6d754ca33.scope Slice: user-1000.slice Owner UID: 1000 (jelle) Boot ID: Machine ID: Hostname: jelle-laptop-ubuntu Storage: /var/lib/systemd/coredump/core.rt_tracing_exam.1000.46b5967af75b46689df7204525afeebe.12625.1710148581000000000000.lz4 Message: Process 12625 (rt_tracing_exam) of user 1000 dumped core.
|
I see in your latest coredump there is a line: I'm wondering if this is a protobuf bug somewhere because 20.04 is quite old, and cactus-rt currently links with the system-level protobuf. My version:
|
So yes, it is indeed a few versions behind. |
Hard to say if that is the problem, I've also created a PR that would check the header compiled against the actual installed library, in case they are different (which could cause segfaults): #75. Maybe you can try it to make sure nothing wrong is happening..? |
Adding I also checked |
Did you ever get closer to solving this problem? |
Sorry for letting this issue go stale, the project was discontinued for other reasons so I did not take a look again. Since then you have made quite some changes to the tracer so I am not sure if this issue still persists. I will start to use |
When running the
tracing_example
, my program crashes when a trace session is stopped with the following error message:The crash happens on line line 87:
app.StopTraceSession();
When using the debugger, I found out that the program crashes when the trace aggregator thread is joined inapp.cc:218
.After the crash, a
data1.perfetto
file is created but only contains ~30 loops for me. This made me suspect that the crash might be due to a small stack size as you mentioned in your blog (part 4).Note that this crash also occurs when I try to implement tracing in my own code, i.e. not just the
tracing_example
program.Fix:increase thread stack sizeEdit: this "fix" might make the problem occur less often, but not disappear (see this comment).
The default stack size set in
ThreadConfig
is 8 MB. Increasing the stack size to 16 MB caused fixed the crash for me. I did this by adding the following line to thethread_config
section intracing_example/main.cc:64
:After this fix, the program does not crash for me anymore and a correct
data1.perfetto
file is created with the entire trace.The text was updated successfully, but these errors were encountered: