-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api, robot-server): use runtime parameter files set in protocols and runs to set in-protocol values #15855
Conversation
…t_csv_parameters_to_file
…t_csv_parameters_to_file
…t_csv_parameters_to_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly makes sense to me and looks good to merge, though I'm not super familiar with the whole RTP flow.
My main concern going forward is the open()
-close()
-pair resource management thing.
@@ -188,6 +192,7 @@ async def create( | |||
notify_publishers: Callable[[], None], | |||
protocol: Optional[ProtocolResource], | |||
run_time_param_values: Optional[PrimitiveRunTimeParamValuesType] = None, | |||
run_time_param_paths: Optional[CSVRuntimeParamPaths] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter, but out of curiosity, why are we adding a second run_time_param_paths
argument all the way down, instead of extending the existing run_time_param_values
arg to also carry CSV paths?
Like, it's not like we have 4 different arguments for int params, string params, boolean params, and float params.
Structurally, one argument seems more correct to me because it makes it structurally impossible for different parameters to have colliding keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this is an artifact of these originally being a dictionary of param names to file-ids, which since file IDs were strings required a different variable for disambiguation. Now that they're Path
objects that isn't needed, but I'm gonna put a TODO to fix that up next week before the code freeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the separate arguments for two reasons-
- it keeps the meaning of a parameter's 'value' consistent throughout code. A CSV parameter's 'value' is its file with its contents, while we are passing the file IDs/ paths in these arguments, not the file itself. So the two arguments keep the two concerns separate.
- because a parameter's value and csv parameter's file paths have to be handled differently, aka, you 'set a param value' vs. 'initialize a file using file path', the two separate arguments make it very easy to split the handling. If the prim. param values & csv param paths are passed in as one entity, you end up having to look up the type of the parameter and then handle the param accordingly. There are a few places in code where we need to know the param type to do this so it adds a lot of repetitive 'is this a CSV param' code.
But if you still think it's better to have one arg for both then I won't oppose it.
if rtp_files: | ||
rtp_paths = { | ||
name: data_files_directory / file_id / data_files_store.get(file_id).name | ||
for name, file_id in rtp_files.items() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have validation so invalid file IDs don't raise a 500 error? Or a # todo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, interesting point. Raising a 500 error here would be very different behavior compared to other parameter validation we do, but just plain skipping invalid file IDs would currently lead to the same error as not providing any file-id to begin with. We can discuss this further but I'll make a TODO so this gets addressed in next week's follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think you'd want to skip it. I think you'd want to treat it like when you try to create a run with a protocolId
that doesn't point to a valid protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the file ID doesn't exist, an error will eventually get raised when we set the CSV parameter during orchestrator load. So it will definitely error out shortly after this, even if it's not validated here and I think that's ok. Because a CSV file isn't exactly like a protocol file. The CSV file's validation will happen in the same place where rest of the RTP parameters' validation occurs.
raise | ||
finally: | ||
if parameter_context is not None: | ||
parameter_context.close_csv_files() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad we're thinking about closing the files when we're done with them, but is this the correct place to do it? Where are the files opened?
For example, it looks like one route for them to be opened is ProtocolExecutor.extract_run_parameters()
, when a protocol is loaded, and that does not necessarily seem paired with any parameter_context.close_csv_files()
.
(The guideline that I'm aiming to follow is that every open has exactly one close, and the thing that does the open is the same as the thing that does the close.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was probably my biggest ❓ with this PR. I didn't immediately see a good place to tightly couple the two as you suggest and so I put it in the place where the majority of protocols would enter in the first place, but I agree it's imperfect. I think this is probably a sign that these should be opened in a different location which would also tie into the next comment you have about file opening. Again, another TODO for next week so science can be unblocked from a functional version of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I would look into doing the open()
and close()
in exec_run()
or one of exec_run()
's callers. I haven't looked at them too closely, so big grain of salt.
file_id = file_path.parent.name | ||
file_name = file_path.name | ||
with file_path.open() as csv_file: | ||
contents = csv_file.read() | ||
|
||
temporary_file = tempfile.NamedTemporaryFile("r+") | ||
temporary_file.write(contents) | ||
temporary_file.flush() | ||
|
||
parameter_file = open(temporary_file.name, "r") | ||
temporary_file.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading this correctly:
- We are given a path, the "original file," which we want to treat as read-only.
- We open the file at that path and copy its contents to a new, temporary file. This makes sense.
- We close that temporary file and then reopen it, and this reopened file is what we use for general use during the protocol.
Why are we doing step 3?
Also, I know you're not supporting this now, but heads up for the future: Any kind of open()
at this stage is making an assumption about the file's encoding before the user's Python has a chance to run and tell us what encoding it expects. If you ever support custom encodings, the open()
will have to move somewhere else, fundamentally. So let's not bake it too deep into the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step 3 is to enforce read-only permissions on the file, since to write to the temporary file we need to make it read-write privileges. Since it's a temporary file there's no inherent danger in giving write privileges but it seemed the more "correct" thing to enforce.
As for encoding you can actually change the encoding of an active file even after opening it via reconfigure
, but again with the previous reply to your comment this code may end up moving closer to where the user is using it, more easily allowing this for future types of files (CSV files I feel are pretty safe to assume UTF-8 text)
@@ -72,6 +73,7 @@ | |||
"ALL", | |||
"OFF_DECK", | |||
"RuntimeParameterRequiredError", | |||
"CSVParameter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this reexport for? Are we expecting users of the Python Protocol API to import this in their scripts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was tacked on but via a discussion with Ed we want this easily exposed since this interface is a first class object in our API now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see now, that makes sense. We should aim to define the class inside opentrons.protocol_api
instead of opentrons.protocols.parameters.types
, then. Until we do that, mind leaving a note on the class definition so it's obvious that it's user-facing?
Also, reminder to rebase onto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @SyntaxColoring comments and also agree with adding a TODO for these cases for now. besides that LGTM!
…entrons into set_csv_parameters_to_file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left a couple of comments.
Tested on OT-2 and verified that CSV protocol works end-to-end! 🙌 🙌 🙌
# Overview This PR follows up on #15855 with various small refactors/improvements. The biggest change in this PR is refactoring how we open the CSV parameter files. Previously we were opening them when we first initialized the parameter. This proved to be a potential issue because we were only closing the files when we reached the execute step of running the protocol (i.e. in a `finally` clause within `run_protocol`), yet there was the small yet possible chance of not reaching that step (for instance if the initial home command failed unexpectedly). To solve this, we are now passing the `Path` object to the `CSVParameter` interface and only opening the file when it is first accessed in the protocol, meaning it will _always_ occur within that try/except block. Another change introduced is properly raising a 400 error for `POST`s to the `/runs`, `/protocols` and `/protocols/{protocolId}/analyses` endpoints when referencing data files (for runtime parameter files) that don't exist. API Version is now also enforced for using `add_csv_file` which now requires your protocol to be at 2.20. The api version is also now passed down to the `CSVParameter` interface, though it is not currently used to gate anything yet. ## Test Plan and Hands on Testing Tested end to end on hardware with the same protocol/csv files as #15855. ## Changelog - Pass `Path` to `CSVParameter` interface and open files within that. - Raise a 400 error when referencing non existent data file IDs for `/runs`, `/protocols` and `/protocols/{protocolId}/analyses` - Require API version 2.20 for `add_csv_file` - Fixes/improvements for testing CSV files/file opening in protocol api ## Review requests ## Risk assessment Low.
…nd less reliance on file reading/creating (#15956) # Overview This PR follows up both #15855 and #15907 with further refinements and small changes to defining and setting CSV parameters, along with behind the scenes refactors. The two main user/client facing changes of this PR are changing the HTTP status code when invalid data file IDs are sent to the `POST` `/runs`, `/protocols` and `/protocols/{protocolId}/analyses` endpoints from 400 to 422, and raising a new `IncompatibleParameterError` when multiple CSV file parameters are added to a single protocol. The change in status code is to reflect that the error is not because of a malformed request, but because there was something in the request that could not be processed (in this case being the data file ID). The new error was added to align with the original PRD and the current UI. While there's no inherent technical limitation for multiple CSV parameters (other than the theoretical limitations with memory, number of file handlers, etc), there are unexpected UI bugs when multiple ones are defined and this change enforces only one per protocol. The other major change of this PR is a non-user facing refactor of how we set the `CSVParameter` interface. Rather than passing the `Path` of the file to the interface and then using a `TemporaryFile` as the basis for all further access of the CSV file and it's contents, now the contents of the file (passed as `bytes`) is what everything else is based off of, including CSV parsing in `parse_as_csv`. With this change, we only ever make a temporary file handler when the user accesses `.file`. With this change reducing the chance of there being an open file handler, a new `file_opened` property was added to the public interface to reduce needless file opening. This is technically user-facing code meant more for internal usage, but I see no danger in exposing it, though if needed we can tag it is a private non-guaranteed method. ## Changelog - Raise `IncompatibleParameterError` when `add_csv_file` in the `ParameterContext` is used more than once - Change status code of invalid data file IDs posted to runs and protocols endpoints from `400` to `422` - Refactor `CSVParameter` to only open temporary file handler when user requests one ## Review requests Should we label `file_opened` as private/use another way of preventing spurious temporary file creation just to close them? ## Risk assessment Low.
Overview
Closes AUTH-419.
This PR hooks up the setting of runtime parameter files in the POST
/runs
,/protocols
and/protocols/{protocolId}/analyses
to the actualCSVParameter
object used in the protocol context.This file is sent via a dictionary of parameter name and file-id, so we resolve the file-id into a
Path
by using thedata_files_directory
anddata_files_store
. This is then passed to theParameterContext
and used to open up a temporary file that contains the contents of the actual file stored on the robot.Test Plan and Hands on Testing
Tested end to end via the following protocol and two CSV files
Changelog
runTimeParameterFiles
into a dictionary ofPaths
run_time_param_paths
all the way to theParameterContext
where they can be opened as temporary file handlersCSVParameter
from theprotocol_api
namespace.Review requests
Risk assessment
Medium.