-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(api, robot-server): CSV parameter improvements #15907
Conversation
…entrons into csv_param_improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks!
|
||
# Open a new file handler for the temporary file with read-only permissions and close the other | ||
read_only_temp_file = open(temporary_file.name, "r") | ||
temporary_file.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the cause of your Windows testing woes, but: NamedTemporaryFile
will, by default, delete temporary_file
when it's .close()
d. So we're opening a new handle to the file and then deleting the underlying file. That might work, I dunno, but it seems iffy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hypothetically, what would we lose by just opening the original file in read-only mode and directly passing it to the protocol? I guess a user could access file.name
and manually muck with things that way, and that would be bad. I guess there could also be some robot-server bug that causes the original file to be modified concurrently, and that would also be bad. Maybe those are reasons enough to avoid passing the file directly, I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that you might consider here is to use shutil.copyfile()
instead of writing into a NamedTemporaryFile
. That will probably be faster because it can happen entirely in the OS, or even entirely in the filesystem, without bytes having to go through Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NamedTemporaryFile
issue was related exclusively to Windows. From the python docs https://docs.python.org/3.9/library/tempfile.html#tempfile.NamedTemporaryFile
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows).
So it seems safe to do and unrelated to the file deletion on close. Since I want to get this is in for the next alpha I'm keeping this method, but I'll look into shutil.copyfile()
in case there's bugs related to this or it proves to be a more stable way.
for name, file_id in rtp_files.items() | ||
} | ||
except FileIdNotFoundError as e: | ||
raise FileIdNotFound(detail=str(e)).as_error(status.HTTP_400_BAD_REQUEST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion either way, but when a run is created with a bad protocol, we return a 404, not a 400. Should we do that here, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean a 422?
Ya I guess that works too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I mean a 404:
opentrons/robot-server/robot_server/runs/router/base_router.py
Lines 250 to 251 in e671678
except ProtocolNotFoundError as e: | |
raise ProtocolNotFound(detail=str(e)).as_error(status.HTTP_404_NOT_FOUND) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I just realized you were talking about the runs endpoint and not protocols endpoint. I still think 400/422 is more appropriate for the reason below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 404
would have also worked, but after discussion with Nick and Shlok we all came to an agreement on 400
which is not an incorrect response code for the situation.
except FileIdNotFoundError as e: | ||
raise FileIdNotFound(detail=str(e)).as_error(status.HTTP_400_BAD_REQUEST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment–404 instead of 400, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either a 400 or 422 is more appropriate here because 404 indicates that the URL resource is not found. I wouldn't consider a csv file as the URL's primary resource.
and parameter.value is not None | ||
): | ||
parameter.value.close() | ||
parameter.value = file_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. is it necessary to assign file path to parameter value? I think the value of the parameter is represented more closely by either the file handler or the file contents. So here's what my mental map is for CSV params:
name
,id
,path
properties carry info about the filevalue
is the actual file; could be either the handler or the contents
What do you think? Also we're kinda close to the deadline so is it an easy thing to change if we want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this change to help facilitate opening the file handler in the parameter itself rather than outside of it (to better couple it with the closing of files done in run_protocol
) but you're right in that this exposes the path, however indirectly and privately. Changing it to be the contents is a non-trivial change and I want to get in these important changes for the next release, so that can come during future refactor work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any special significance to Oops, didn't see your reply, thanks!value
here? Like is this .value
directly exposed to PAPI users as a public .value
property or anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline about whether value
should be the path
or not but since it's not a first class API property, hopefully there's not much harm in exposing the path in this way. We will however keep an eye out for possible mis-use, and also get feedback from science/ FAS and change if needed.
Everything else looks good!
…entrons into csv_param_improvements
…nd less reliance on file reading/creating (#15956) # Overview This PR follows up both #15855 and #15907 with further refinements and small changes to defining and setting CSV parameters, along with behind the scenes refactors. The two main user/client facing changes of this PR are changing the HTTP status code when invalid data file IDs are sent to the `POST` `/runs`, `/protocols` and `/protocols/{protocolId}/analyses` endpoints from 400 to 422, and raising a new `IncompatibleParameterError` when multiple CSV file parameters are added to a single protocol. The change in status code is to reflect that the error is not because of a malformed request, but because there was something in the request that could not be processed (in this case being the data file ID). The new error was added to align with the original PRD and the current UI. While there's no inherent technical limitation for multiple CSV parameters (other than the theoretical limitations with memory, number of file handlers, etc), there are unexpected UI bugs when multiple ones are defined and this change enforces only one per protocol. The other major change of this PR is a non-user facing refactor of how we set the `CSVParameter` interface. Rather than passing the `Path` of the file to the interface and then using a `TemporaryFile` as the basis for all further access of the CSV file and it's contents, now the contents of the file (passed as `bytes`) is what everything else is based off of, including CSV parsing in `parse_as_csv`. With this change, we only ever make a temporary file handler when the user accesses `.file`. With this change reducing the chance of there being an open file handler, a new `file_opened` property was added to the public interface to reduce needless file opening. This is technically user-facing code meant more for internal usage, but I see no danger in exposing it, though if needed we can tag it is a private non-guaranteed method. ## Changelog - Raise `IncompatibleParameterError` when `add_csv_file` in the `ParameterContext` is used more than once - Change status code of invalid data file IDs posted to runs and protocols endpoints from `400` to `422` - Refactor `CSVParameter` to only open temporary file handler when user requests one ## Review requests Should we label `file_opened` as private/use another way of preventing spurious temporary file creation just to close them? ## Risk assessment Low.
Overview
This PR follows up on #15855 with various small refactors/improvements.
The biggest change in this PR is refactoring how we open the CSV parameter files. Previously we were opening them when we first initialized the parameter. This proved to be a potential issue because we were only closing the files when we reached the execute step of running the protocol (i.e. in a
finally
clause withinrun_protocol
), yet there was the small yet possible chance of not reaching that step (for instance if the initial home command failed unexpectedly). To solve this, we are now passing thePath
object to theCSVParameter
interface and only opening the file when it is first accessed in the protocol, meaning it will always occur within that try/except block.Another change introduced is properly raising a 400 error for
POST
s to the/runs
,/protocols
and/protocols/{protocolId}/analyses
endpoints when referencing data files (for runtime parameter files) that don't exist.API Version is now also enforced for using
add_csv_file
which now requires your protocol to be at 2.20. The api version is also now passed down to theCSVParameter
interface, though it is not currently used to gate anything yet.Test Plan and Hands on Testing
Tested end to end on hardware with the same protocol/csv files as #15855.
Changelog
Path
toCSVParameter
interface and open files within that./runs
,/protocols
and/protocols/{protocolId}/analyses
add_csv_file
Review requests
Risk assessment
Low.