Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rrcall to get current time #2827

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vchuravy
Copy link

Exposes the current time through a syscall. The intended use is for an application that knows
it is running under rr to record the current rr time to be able to direct the user to jump to
this point in time. As an example the Julia test-suite could record on a failed test the rr time
and report that together with the test failure.

I noticed that for mark_stdio we also get t->tgid() is the time sufficient or do I also need that
information?

cc: @Keno, @neboat

@Keno
Copy link
Member

Keno commented Mar 16, 2021

Implementation wise, I think this is fine, though I wonder if it wouldn't be better to instead have the ability to mark a particular point in the execution, which rr would then record in the trace and the frontend (rr or pernosco) could offer various navigation to. Could even be just recorded in the trace buffer if syscallbuf is enabled to make it really low overhead.

@rocallahan
Copy link
Collaborator

I wonder if it wouldn't be better to instead have the ability to mark a particular point in the execution, which rr would then record in the trace and the frontend (rr or pernosco) could offer various navigation to.

I guess you would need to flesh out what that feature would look like at rr replay time. If you want the tracee to provide a string label for each time, and you want to be able to get the list of labels without doing a full replay or full read of the trace, then we have to store the labels in some new area in the trace, which means extending the trace format.

@Keno
Copy link
Member

Keno commented Mar 16, 2021

Yeah, it's all a bit complicated. Maybe let's just do this for now then and we'll see if people find it useful. If so, we can come back later and engineer something fancier.

@@ -15,4 +15,8 @@ void rr_detach_teleport(void) {
test_assert(err == 0);
}

int rr_current_time(void) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something supposed to use this function?

@neboat
Copy link

neboat commented Mar 23, 2021

Hey all, I've been playing around with this PR recently, and although it works, I am encountering some issues that I would like your input on. (Sorry in advance for the long message.)

Here's my situation: I have a program-analysis tool, similar to a Google sanitizer, that identifies interesting pairs of executed instructions in a serial program's execution, and I would like to integrate this tool with RR. Normally the tool works similarly to a sanitizer: the user compiles and links her program with the tool, and then, when she runs the executable, the tool runs as a shadow computation and reports interesting pairs of instructions that it detects as the program executes. I would like to make this tool interact with RR intelligently. In particular, I would like the user to be able to compile and link her program with the tool and use RR to record a run of that tool-instrumented executable. Then, during RR's replay, I would like the user to have some way to easily navigate between the interesting pairs of executed instructions and to move back and forth between the two instructions in each pair.

(For more context, the tool in question is a race detector, and the interesting pairs of instructions are logically parallel instructions involved in a race. But it turns out that I'm not actually concerned with RR's support for multithreading, because this race detector is able to detect races even when the program is run on 1 thread. Hopefully, you shouldn't need to grok the details of this race detector to understand the problem. It should instead suffice to think of this tool as a sanitizer that identifies interesting pairs of instructions in a serial program's execution.)

To this end, I started playing with this PR, and it gets me part of the way there. With this PR, the tool can read and store the RR times for executed instructions. Then, for each interesting instruction pair, the tool can report the RR times of the two instructions in the pair.

However, this solution falls short in a few key ways.

  • The tool needs to get the current time from RR very frequently, approximately once per memory read or write in the program-under-test. The system-call interface to get the current RR time is OK for tiny programs, but introduces huge overheads that make it unusable on anything real. (I'm currently estimating 1000's of times slower, compared to a normal execution of the program with the tool.)
  • I'm not sure how to provide a nice interface for users to navigate between the two instructions involved in a race.
    • I can start the replay using rr replay -g <time> to navigate to a particular RR time, but I'm not sure how to navigate to a particular time during a replay. (I admit that I might be unaware of some part of RR's interface that would let me do this.)
    • Ideally, I would like to provide users an even more friendly interface, so that users don't have to copy-and-paste RR times recorded as having been involved in a race. But I'm not sure how to do this.

I've tried some things to work around these issues. For example, I tried modifying the tool itself to maintain its own event counter of interesting executed instructions and then report instruction pairs in terms of this event counter. This approach dramatically speeds up the time to record the program execution. But it turns out to be far too slow to use gdb's conditional breakpoints to navigate to instruction pairs during RR replay based on this user-defined event counter.

Do you have any ideas for how to support the functionality I'm looking for, possibly by extending RR? Right now I'm imagining some possible RR features might be able to solve this problem effectively:

  1. A much faster way to get the current RR time during RR record.
  2. A facility in RR replay to get the current RR time and quickly navigate to different RR times. With this feature, I could imagine writing a gdb command to binary-search for the correct RR time based on the tool's internal event counter.

This being said, I'm open to suggestions.

I believe some of the issues I've encountered are generic to anyone who wants to make a "smart" dynamic-analysis productivity tool that integrates with RR to allow users to quickly navigate to times when errors are detected. As such, I'm hoping that any extensions to RR to support this functionality would be generally useful for tool writers.

Thoughts? Thanks in advance for your input.

Cheers,
TB

@Keno
Copy link
Member

Keno commented Mar 24, 2021

  • A much faster way to get the current RR time during RR record.

We can syscallbuffer this rr call by having rr write the current event time to userspace memory and just returning that. Doesn't even really need to allocate a syscallbuf record.

A facility in RR replay to get the current RR time and quickly navigate to different RR times. With this feature, I could imagine writing a gdb command to binary-search for the correct RR time based on the tool's internal event counter.

I think this could just be handled by rr having a counter of how many times the rrcall was called (and returning that also), and then using the replay assist trick to let rr set a "breakpoint" on the appropriate counter value.

@khuey
Copy link
Collaborator

khuey commented Mar 24, 2021

Why can't the race detector maintain its own counter of the number of memory reads/writes that have been made in userspace without any cooperation from the rr supervisor at all, and then just use conditional watchpoints on the location in gdb to find it again during the replay?

@neboat
Copy link

neboat commented Mar 24, 2021

@khuey I tried something similar using conditional breakpoints in gdb, but that approach turns out to be very slow during replay. As I understand it, the slowdown comes from the overhead of switching between gdb and the program itself to repeatedly evaluate the condition. Conditional watchpoints don't seem to be appreciably faster, I presume for the same reason.

@Keno Both of those changes sound good to me, though I'm not terribly familiar with rr internals.

@khuey
Copy link
Collaborator

khuey commented Mar 24, 2021

I think the overhead is more likely to be the cost of switching between rr and the tracee during replay each time the watchpoint fires and the condition needs to be evaluated. gdb should be sending the condition to rr so we shouldn't need to go back to gdb each time.

@neboat
Copy link

neboat commented Mar 24, 2021

Ah, that makes sense to me. Thanks for clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants