-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(app, react-api-client): add "retry failed command" during Error Recovery #15170
Conversation
…covery Adds the ability to retry command that caused the run to enter error recovery. Resumes the run after completing the failed command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some explanatory comments below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diff this against the old index.tsx
to see changes.
The differences are the new route for the "retrying command" view, the useInitialPipetteHome
, and the new name, ErrorRecoveryWizard
.
runStatus: RunStatus | null | ||
): UseErrorRecoveryResult { | ||
const [isEREnabled, setIsEREnabled] = React.useState(false) | ||
const failedCommand = useCurrentlyFailedRunCommand(runId, runStatus) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not directly exporting useCurrentlyFailedRunCommand
may be something we have to refactor out later, as this implies that we only would ever need a failedCommand
for reasons related to Error Recovery.
* Recovery Step: Analogous to a "step" in other wizard flows. | ||
*/ | ||
const [recoveryMap, setRecoveryMap] = React.useState<IRecoveryMap>({ | ||
route: RECOVERY_MAP.ROBOT_IN_MOTION.ROUTE, // Initialize the route to "in motion", since wizard always start with a home command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect the robot to already be in motion when we start this wizard?
Or does the robot not start homing until the useInitialPipetteHome call like 15 lines down?
If it is already in motion, then this makes sense, if it is not, should useInitialPipetteHome
be the thing that adds the ROBOT_IN_MOTION step to the route?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Q - doing it this way prevents us initially rendering some other screen for one render cycle before the in-motion screen. This problem is really only noticeable if you test it on the physical ODD, and it looks bad enough IMO that I'd rather eagerly set the robot to 'in motion' immediately and not have the screen immediately changing as soon as we launch ER.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DerekMaggio After further investigation, I found a better React way of doing this while initializing the route to something more sensible, so thank you for calling this out!
React.useEffect(() => { | ||
void setRobotInMotion(true) | ||
.then(() => homePipetteZAxes()) | ||
.then(() => setRobotInMotion(false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably wants to be .finally()
<ErrorRecoveryHeader errorKind={props.errorKind} /> | ||
<ErrorRecoveryContent {...props} /> | ||
</Flex>, | ||
getTopPortalEl() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably wants to be getModalPortalEl
if this is on the desktop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep, great point. Given hi-fi designs, it looks like we may have some modal-within-modals as well, so we'd want the lower zIndex
for ODD as well.
getTopPortalEl() | ||
) | ||
interface UseErrorRecoveryResult { | ||
isEREnabled: boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of a nit, but the use of "enabled" here makes you think this is a wrapper for the feature flag at first glance. can we call it like isERActive
and so on?
// Otherwise, returns null. | ||
const ALL_COMMANDS_POLL_MS = 5000 | ||
|
||
export function useCurrentlyFailedRunCommand( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
credit to @SyntaxColoring for this one - what if we added a cursor to GET /runs/:runid/commands
that was "last failed command" so we didn't have to parse the command list at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about a server side change? If app side, how would this work with a desktop app that navigates to the run after 100+ fixit commands? Wouldn't it need knowledge of which command failed to know which cursor to use? CC @SyntaxColoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a server side change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on the server part of this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Closes EXEC-432
Overview
Adds the ability to retry the command that caused the run to enter Error Recovery, resuming the run after the issued command completes. Implementing the retry logic requires laying out (hopefully) most of the remaining app-side foundational work, therefore that is included here as well.
There is a good bit of noise in this PR. A few specifics are mentioned via comments. Below are the major highlights and discussion points:
Chaining Fixit Commands
In order to chain (or even create) commands with a
fixit
intent, we utilizeuseRecoveryCommands
. I currently make the assumption that we do not want to handle fixit commands that fail during error recovery, so we alwayscontinuePastCommandFailure
. Whether we actually want this behavior or not (or make this dependent on the exact fixit command) is an involved conversation and must also ensure error recovery can't possibly cause an infinite loop (such as a faulty sensor always detecting no tip attached after the user dispatches additional fixit pick up tip commands).Identifying the Failed Run Command
Designs utilize splash pages that require knowledge of the failed command (example). While identifying and retaining the failed command is simple on the ODD, the complexity increases when we add desktop apps into the mix. A desktop user may be anywhere in the app and then may navigate to Error Recovery while a user engages (or has engaged) with Error Recovery on the ODD. In order to determine which command is the failed command, we must navigate backwards through the active run's list of commands, finding the most recent failed command with a
protocol
intent. And that's the issue; becausefixit
commands are added to the run command list, the following scenario may occur:fixit
commands. These commands are appended to the protocol run commands list.protocol
command. No failed command is found when querying the server, which occurs due to the failed command being farther back in history than thepageLength
sent to GET /runs/:runId/commands returned. See this discussion for some background as to why this is the case.@sanni-t's recent PR #15031 solves the serialization issue for completed runs. We could consider extending this functionality for commands during an on-going run (and maybe add a rate limiter?!?). @sfoster1 's thoughts are highlighted in the slack convo linked above after discussion in-person.
Robot Motion State
If you look at designs, there are a lot of different "robot in motion" screens. Sometimes we only want the motion screen when the robot does a command, and sometimes we want to persist it longer or indefinitely. I think the best way to handle motion is to explicitly control it by setting the motion screen before we do a command and then control it afterwards. This is a departure from the implicit
RobotInMotion
component takeover that all the other wizard flows use, but I don't think that's going to work for us, and even if it is possible, the control flow would be very hard to follow.Test Plan
Changelog
Review requests
Designs have changed quite a bit in the past couple of weeks. This PR is just focused on functionality and structure!
Risk assessment
low -- behind a ff