feat(robot-server, app): add a new endpoint for fast-fetching all run commands #15031

sanni-t · 2024-04-26T20:47:23Z

Overview

Robot server changes:

Adds a new endpoint- GET /runs/:run_id/commandsAsPreSerializedList to the run commands router. This endpoint returns a list of pre-serialized commands (that are valid json objects) of a finished run.

This endpoint is a much faster alternative to the GET /runs/:run_id/commands endpoint when fetching all commands of a completed run.

Also adds notification publishing when pre-serialized commands become available for a run.

App changes

closes RQA-2645 and RQA-2647

Test Plan

Run a large protocol (with ~10k steps) on a Flex and an OT-2
While the run is active, verify that hitting the new endpoint returns a 503 service unavailable response
Once the run is finished (whether after running successfully/ after failing/ after being canceled), verify that the new endpoint returns a list of stringified commands with total length of list equal to number of commands in the protocol.
Un-current the run and again verify the above response
Both the above two requests should take only a couple of seconds to get a response (as opposed to 30+ seconds)
Test that a notification gets sent when a run ends so that pre-serialized commands can be fetched

Changelog

added the new endpoint to runs/commands_router
added run commands fetching to run_store
added unit tests and integration tests
added a runs publish topic for pre-serialized commands
added notification publishing for pre-serialized commands availability after a run has been committed to the database. This happens in two places:
- in RunController, after a run has finished running (either successfully ended, failed with error or stopped by the client)
- in RunDataManager after a run has been un-currented.
added an api client function for getting from new commandsAsPreSerialized endpoint, along with react query and notifications wrapper hooks
implement the above in RunPreview component for terminal runs

Review requests

check the SQL request for correctness/ efficiency
test on a robot
test on a robot for endpoint behavior during transition states- when run data is being written to or modified/erased from the database
I have added the current command publisher to the 'unsubscribe' list once a run is cleaned up. Does this have any unexpected effect?

Risk assessment

New endpoint so impact on existing code is close to none.

sanni-t · 2024-04-26T20:56:48Z

robot-server/robot_server/runs/run_store.py

@@ -428,7 +428,7 @@ def get_commands_slice(
                )
                .order_by(run_command_table.c.index_in_run)
            )
-
+            # select_all_cmds_as_doc = sqlalchemy.select(sqlalchemy.func.json_array_elements)


oops. Left over by mistake. Removing

robot-server/robot_server/runs/run_data_manager.py

shlokamin · 2024-04-29T19:00:45Z

app/src/organisms/RunPreview/index.tsx

+  const parsedCommandsFromQuery = commandsFromQuery.map(command =>
+    JSON.parse(command)
+  )


i would actually have the react query do this for us, since anyone using this hook will probably want the data to be deserialized

shlokamin · 2024-04-29T19:01:32Z

app/src/organisms/RunPreview/index.tsx

@@ -67,7 +77,7 @@ export const RunPreviewComponent = (
  ] = React.useState<boolean>(true)
  if (robotSideAnalysis == null) return null
  const commands =
-    (isRunTerminal ? commandsFromQuery : robotSideAnalysis.commands) ?? []
+    (isRunTerminal ? parsedCommandsFromQuery : robotSideAnalysis.commands) ?? []


talked over slack but falling back to analysis commands if the run commands are empty is probably better

shlokamin · 2024-04-29T19:04:27Z

app/src/organisms/RunPreview/index.tsx

-    enabled: isRunTerminal,
-  }).data?.data
+  const commandsFromQuery =
+    useAllCommandsAsPreSerializedList(


i just talked to @mjhuff and he thinks it would be easy to have the pre serialized endpoint be hooked up MQTT. we know when we're putting stuff in the DB, so at that point the server can push an update to the client @sanni-t

sounds good

shlokamin

frontend code looks good! i tested on a bot and this seems to work great — nice job y'all!

shlokamin · 2024-04-30T21:16:38Z

app/src/organisms/RunPreview/index.tsx

+      staleTime: Infinity,
+      cacheTime: Infinity,


i guess we don't need staleTime and cacheTime anymore now that we're using the notification service right?

Do we want to leave those there in the event that we can't establish MQTT connection and need to fallback to HTTP with those options passed through?

oh yes you are totally right!

mjhuff

Nice job on wiring up the notifications!

mjhuff · 2024-05-01T02:34:40Z

@ncdiehl11 This is my bad for not telling you earlier (and it's also not a blocker) - on L43 of eslintrc.js, we actually have a lint rule to enforce using notification wrappers when they exist.

We should add the HTTP polling equivalent to the list to prevent devs from forgetting and using it by mistake.

SyntaxColoring · 2024-05-01T13:05:19Z

robot-server/robot_server/runs/router/commands_router.py

@@ -351,6 +365,53 @@ async def get_run_commands(
    )


+@PydanticResponse.wrap_route(
+    commands_router.get,
+    path="/runs/{runId}/commandsAsPreSerializedList",


Is there a way to do this that doesn't require:

A separate endpoint path?

New 503 semantics for runs that haven't completed?

i.e. could we make this something like GET /runs/{id}/commands?validate=false instead?

I know we're repeating how we did things with GET /protocols/analyses/{id}/asDocument, but I think it was a mistake for that endpoint to work like that, in hindsight.

Unifying them would have integration testing benefits. Imagine just parametrizing every Tavern test with ?validate=true/?validate=false.

Is there a way to do this that doesn't require:
A separate endpoint path?

I think the separate endpoint path is the right thing to do here because the shape/type of the response is different from the regular ../commands endpoint

New 503 semantics for runs that haven't completed?

Talked about this briefly on the call and posting here for posterity- Not having a 503 for runs can be two things-

returning an empty list in the response: this is misleading and can cause confusion as to whether this run just didn't have any commands or whether the commands just haven't been serialized yet. We agreed that this is not a good option

serializing commands from memory (since it's a current run) and returning them as response: this is a good option but this too takes a non-ideal amount of processing time to respond with. I will test for exactly how much time and respond here. If it's faster/ takes similar time as fetching pre-serialized commands, then I'll replace the 503 with it. If it takes more than twice the amount it would to fetch pre-serialized commands, then I think it would be more disruptive to the client flow (and user experience) if the client hits the endpoint at the wrong time and ends up having to wait a long time and blocking rest of the flow.

I think the separate endpoint path is the right thing to do here because the shape/type of the response is different from the regular ../commands endpoint

Definitely, if the response shape is different. If possible, I'm saying the response shape should not be different. (Discussed in the other thread: #15031 (comment))

If it takes more than twice the amount it would to fetch pre-serialized commands, then I think it would be more disruptive to the client flow (and user experience) if the client hits the endpoint at the wrong time and ends up having to wait a long time and blocking rest of the flow.

I'm not totally following this. How does returning 503 avoid the client having to wait a long time and avoid blocking the rest of the flow? Either way, the client can only move on after the server has spent time serializing everything.

SyntaxColoring

LGTM given that it's documented as experimental, but see question above. Thank you!

ncdiehl11 · 2024-05-01T13:57:11Z

@ncdiehl11 This is my bad for not telling you earlier (and it's also not a blocker) - on L43 of eslintrc.js, we actually have a lint rule to enforce using notification wrappers when they exist.

We should add the HTTP polling equivalent to the list to prevent devs from forgetting and using it by mistake.

[spoke in person] holding off on this in this PR as we are targeting release branch instead of edge where those eslint file rules were added @mjhuff

SyntaxColoring · 2024-05-01T14:31:04Z

robot-server/tests/integration/http_api/runs/test_run_queued_protocol_commands.tavern.yaml

+          - !anystr
+          - !anystr
+          - !anystr
+          - !anystr


Oh I'm just noticing this. Are we returning something like this?

{ "data": [ "{\"commandType\": \"home\", ...}" // JSON list of strings that are parseable as JSON ] }

If so, can we concatenate the command strings on the server side so we always emit straight JSON?

Yes, your example is what the response looks like.
Are you suggesting changing to something like this?-

{ "data": ' "{\"commandType\": \"home\", ...}", "{\"commandType\": \"aspirate\", ...}", "{...}" ' }

..so that data is a single string of concatenated individual command strings that are parseable as JSON?

No, sorry, I was unclear. I'm suggesting changing to something like this:

{ "data": [ { "commandType": "home", // ... }, { "commandType": "loadLabware", // .. }, // ... ] }

So that it matches the normal GET /runs/{id}/commands.

[Edit] Sorry, I'm still being unclear. I mean we could do something like this:

response_body = '{"data":[' + ','.join(serialized_commands) + ']}'

~~If concatenating JSON fragments like that feels too icky, I'm sure there's a way to write a custom JSON encoder to do it.~~ [Edit: Nope, standard library JSON encoders can't do this.]

Just tested this and it's 5x more expensive than just returning the serialized commands list. It's still quite faster than doing the full conversion to pydantic response models so we'll keep that as an option. I've added a todo comment to address this.

For posterity: we talked about this in Slack, and the "5x more expensive" result was with code roughly like:

commands_list = [json.loads(c) for c in command_jsons_from_sql]

Rather than with ','.join().

SyntaxColoring · 2024-05-01T15:48:26Z

robot-server/robot_server/service/notifications/publishers/runs_publisher.py

+        engine_state_summary = self._run_hooks.get_state_summary(self._run_hooks.run_id)
+        if (
+            isinstance(engine_state_summary, StateSummary)
+            and engine_state_summary.completedAt is not None
+        ):
+            await self.publish_pre_serialized_commands_notification(run_id=run_id)


Discussed in a call, but I didn't explain it well, so I'll try to be more coherent here...

You explained that this notifies the client right away if the commands are already available. That totally makes sense, because there won't be any more notifications later.

However, I think we should go further and send this first notification unconditionally, even when the commands are not available yet. Because I think that's how these topics should work in general.

I view it like this: This notification topic is not meant to say when the commands are ready. It's merely meant to say when to hit the HTTP endpoint, and the HTTP endpoint tells you whether the commands are ready.

Under the scheme I'm describing, an initial unconditional notification would basically confirm to the client that the subscription is active. This is needed in the general case because it solves a missed-update problem (see option 1); every topic should do it or something equivalent.

Maybe you're thinking of the conditional that you have right now as an optimization. Like, the server knows that the client won't like the GET response that is has to offer right now, so why bother inducing a GET request?

But I think we want to avoid the notifications system becoming a bunch of bespoke mini-APIs each with its own rules. Instead, we want it to be dumb and generic and consistent so we can stamp it out on all of our endpoints without thinking about it, share infrastructural code, integration-test it in generic form, and so on. If that comes at a cost of occasionally prompting the client to send out a superfluous GET, I think it's worth it.

Again, though this doesn't have to change for this PR.

Hmm, I think those are valid points. I can make the notification unconditional upon subscription.
But I think it might need some app side changes too, unless handling a 503 response is already implemented. Thoughts @ncdiehl11 @shlokamin ?

Definitely feel free to leave it out if it needs nontrivial changes or entails testing everything again. I'd be happy with a # todo comment.

To @SyntaxColoring 's point, currently, the client always performs an implicit endpoint hit whenever it subscribes to a topic to solve for the missed-update problem. This behavior is unsurprising from a client perspective, because this is what happens in a non-notification, vanilla HTTP hook (more importantly, see the aside below). In the current iteration of notifications, the server is responsible only for publishing updates for a given topic.

After discussion with @shlokamin the intent of this fast-fetch behavior, it doesn't sound like we need this need this bit. The runs_publisher is always initialized at the start of a run, and the resources this endpoint/topic manage do not change until the end of a run (if I understand correctly?).

The reason initialize() contains a publish for /runs/:runId is that an update to the underlying resource occurs at this point in time. This accounts for the following scenario:

A client subscribes to a non-yet existing run.

Simultaneously, the client performs an initial GET and the resources errors out/whatever it currently does.

The run is created. At this point, the /runs/:runId resource is updated, so we need to publish the update to prevent the client from holding stale data.

TLDR: If the fast-fetch resource can't actually update when a run is created, we don't need this bit. The client will GET it whenever it subscribes, and the endpoint should respond with the current state of the resource. Only publish when the resource updates.

ASIDE: The concept of confirming an active subscription does not play well with the MQTT broker-client model, as the robot-server (a client of the broker) has no ability to know what clients are subscribed to a given topic. The pub-sub model is meant to account for present and future states of a resource, not past states. The responsibility to solve the stale data problem must fall on the client in MQTT's pub-sub model.

@shlokamin is there precedent here for checking the response code? Currently, I am null checking the response data object

@ncdiehl11 Yes, look at useCurrentMaintenanceRun for an example

Thanks for the additional context @mjhuff!
I will remove these lines

@sanni-t and thank you for bearing with me! 😄

…t, add react wrapper, utilize in runpreview

… comes back empty

…endpoint

sfoster1

See inline - what about returning jsonlines instead of encapsulated json?

sfoster1 · 2024-05-01T17:12:51Z

robot-server/robot_server/runs/router/commands_router.py

+async def get_run_commands_as_pre_serialized_list(
+    runId: str,
+    run_data_manager: RunDataManager = Depends(get_run_data_manager),
+) -> PydanticResponse[SimpleMultiBody[str]]:


Since the fundamental idea with this endpoint is to avoid server work by not doing json serialization, what if this endpoint's response was in the form of jsonlines, json object records (aka the json-formatted Command instances we get from the database) that are line-separated? I think this is a good idea for two reasons:

It requires less server work. With a SimpleMultiBody like this, we still have to do the work of built in json module encoding all those commands into a json array - things like escaping all the internal quotes

It generalizes better (IMO) to queries for running runs, since it's not totally contained and therefore doesn't really have "this is a complete view" semantics

If we're concerned about breaking similarity with the other elements of our api by not being json, one thing we can do is let the client specify Accept for either application/json (MultiBody response) or application/jsonlines

We can get advantage 1 without giving up our usual response format, right? JSON lines would look basically this:

response_body = "\n".join(serialized_commands)

But this would be equally fast:

response_body = '{"data":[' + ','.join(serialized_commands) + ']}'

~~And I'm sure there's a way to write a custom JSON encoder to do it if concatenating JSON fragments like that feels too icky.~~ [Edit: Nope, standard library JSON encoders can't do this.]

Could you clarify your point 2? I'm not sure what you mean by "contained" or how that affects generalization.

Interesting idea. I've added a todo comment to investigate this option further.

… commands (#15031) # Overview **Robot server changes:** Adds a new endpoint- `GET /runs/:run_id/commandsAsPreSerializedList` to the run commands router. This endpoint returns a list of pre-serialized commands (that are valid json objects) of a finished run. This endpoint is a much faster alternative to the `GET /runs/:run_id/commands` endpoint when fetching all commands of a completed run. Also adds notification publishing when pre-serialized commands become available for a run. **App changes** closes RQA-2645 and RQA-2647 # Risk assessment Back-end: New endpoint so impact on existing code is close to none. App: Medium. Fixes issues in existing behavior of handling historical runs. --------- Co-authored-by: ncdiehl11 <[email protected]> Co-authored-by: ncdiehl11 <[email protected]>

sanni-t commented Apr 26, 2024

View reviewed changes

sfoster1 reviewed Apr 26, 2024

View reviewed changes

robot-server/robot_server/runs/run_data_manager.py Show resolved Hide resolved

shlokamin reviewed Apr 29, 2024

View reviewed changes

shlokamin approved these changes Apr 30, 2024

View reviewed changes

sanni-t changed the title ~~feat(robot-server): add a new endpoint for fast-fetching all run commands~~ feat(robot-server, app): add a new endpoint for fast-fetching all run commands Apr 30, 2024

sanni-t marked this pull request as ready for review April 30, 2024 22:27

sanni-t requested review from a team as code owners April 30, 2024 22:27

sanni-t requested review from mjhuff and jbleon95 and removed request for a team April 30, 2024 22:27

mjhuff approved these changes May 1, 2024

View reviewed changes

SyntaxColoring reviewed May 1, 2024

View reviewed changes

SyntaxColoring approved these changes May 1, 2024

View reviewed changes

SyntaxColoring reviewed May 1, 2024

View reviewed changes

sanni-t and others added 10 commits May 1, 2024 12:32

added a new endpoint for fast-fetching all run commands

d11cd3c

add new api-client function for getting commands as preserialized lis…

1f85bc1

…t, add react wrapper, utilize in runpreview

fallback to analysis commands if commandsAsPreSerializedList endpoint…

8248e9a

… comes back empty

move command parsing logic to react query hook

c943dd1

add notification service for run commands

ab89d5b

update commands topic, types

763cd91

return error for active run, add tests, improved docstrings

b3138d8

added notification for pre-serialized commands list availability

f56ac48

lint

d02bf45

cleaned up publisher, added tests

eb5239f

sanni-t force-pushed the robot-server-add_new_fast_fetching_run_cmds_endpoint branch from 8fba742 to eb5239f Compare May 1, 2024 16:33

Merge branch 'edge' into robot-server-add_new_fast_fetching_run_cmds_…

8e7f69d

…endpoint

sanni-t changed the base branch from [email protected] to edge May 1, 2024 16:34

sfoster1 reviewed May 1, 2024

View reviewed changes

ncdiehl11 and others added 3 commits May 1, 2024 13:40

add vanilla useAllCommansAsPreSerializedList to eslint file

8dc0dd8

do not notify pre_serialized_commands during init for historical runs

4a0aa9b

add todo comment

d00e1bc

sanni-t merged commit f44872b into edge May 1, 2024
17 checks passed

sanni-t deleted the robot-server-add_new_fast_fetching_run_cmds_endpoint branch May 2, 2024 14:12

mjhuff mentioned this pull request May 13, 2024

feat(app, react-api-client): add "retry failed command" during Error Recovery #15170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(robot-server, app): add a new endpoint for fast-fetching all run commands #15031

feat(robot-server, app): add a new endpoint for fast-fetching all run commands #15031

sanni-t commented Apr 26, 2024 •

edited by ncdiehl11

Loading

sanni-t Apr 26, 2024

shlokamin Apr 29, 2024

shlokamin Apr 29, 2024

shlokamin Apr 29, 2024

sanni-t Apr 29, 2024

shlokamin left a comment

shlokamin Apr 30, 2024

ncdiehl11 Apr 30, 2024

shlokamin Apr 30, 2024

mjhuff left a comment

mjhuff commented May 1, 2024

SyntaxColoring May 1, 2024 •

edited

Loading

sanni-t May 1, 2024

SyntaxColoring May 1, 2024 •

edited

Loading

SyntaxColoring left a comment

ncdiehl11 commented May 1, 2024

SyntaxColoring May 1, 2024 •

edited

Loading

sanni-t May 1, 2024

SyntaxColoring May 1, 2024 •

edited

Loading

sanni-t May 1, 2024

SyntaxColoring May 9, 2024

SyntaxColoring May 1, 2024

SyntaxColoring May 1, 2024

sanni-t May 1, 2024 •

edited

Loading

SyntaxColoring May 1, 2024 •

edited

Loading

mjhuff May 1, 2024 •

edited

Loading

ncdiehl11 May 1, 2024

mjhuff May 1, 2024 •

edited

Loading

sanni-t May 1, 2024

mjhuff May 1, 2024

sfoster1 left a comment

sfoster1 May 1, 2024

SyntaxColoring May 1, 2024 •

edited

Loading

sanni-t May 1, 2024

feat(robot-server, app): add a new endpoint for fast-fetching all run commands #15031

feat(robot-server, app): add a new endpoint for fast-fetching all run commands #15031

Conversation

sanni-t commented Apr 26, 2024 • edited by ncdiehl11 Loading

Overview

Test Plan

Changelog

Review requests

Risk assessment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shlokamin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjhuff left a comment

Choose a reason for hiding this comment

mjhuff commented May 1, 2024

SyntaxColoring May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring May 1, 2024 • edited Loading

Choose a reason for hiding this comment

SyntaxColoring left a comment

Choose a reason for hiding this comment

ncdiehl11 commented May 1, 2024

SyntaxColoring May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanni-t May 1, 2024 • edited Loading

Choose a reason for hiding this comment

SyntaxColoring May 1, 2024 • edited Loading

Choose a reason for hiding this comment

mjhuff May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mjhuff May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sfoster1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SyntaxColoring May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanni-t commented Apr 26, 2024 •

edited by ncdiehl11

Loading

SyntaxColoring May 1, 2024 •

edited

Loading

SyntaxColoring May 1, 2024 •

edited

Loading

SyntaxColoring May 1, 2024 •

edited

Loading

SyntaxColoring May 1, 2024 •

edited

Loading

sanni-t May 1, 2024 •

edited

Loading

SyntaxColoring May 1, 2024 •

edited

Loading

mjhuff May 1, 2024 •

edited

Loading

mjhuff May 1, 2024 •

edited

Loading

SyntaxColoring May 1, 2024 •

edited

Loading