Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSPS-123 Refactor to follow DSP's Async API best practices #48

Merged
merged 38 commits into from
Jan 23, 2024

Conversation

mmorgantaylor
Copy link
Collaborator

@mmorgantaylor mmorgantaylor commented Jan 9, 2024

Description

This PR brings our service closer to being in line with DSP's Asynchronous API best practices doc. Remaining work for this goal that is not included in this PR is encapsulated in the following tickets:

  • TSPS-135: Implement GET imputation/result/{id} endpoint
  • TSPS-136: Require the caller to provide the job UUID
  • TSPS-150: Write tests for JobApiUtils

The plan of changes can be found in this doc.

The changes in this PR are as follows:

API endpoint and controller changes:

  • Create the following API endpoints:
    • POST /api/pipelines/v1alpha1/{pipelineId} to create a new pipeline of type {pipelineId} e.g. imputation. The response to this request now contains a JobControl object (specified by the best practices doc) that contains the jobId, rather than just the jobId.
    • GET /api/pipelines/v1alpha1/{pipelineId}/jobs to retrieve all pipeline-specific jobs requested by user.
    • GET /api/job/v1alpha1/jobs to retrieve all jobs (across pipelines) requested by user
    • GET /api/job/v1alpha1/jobs/{jobId} to retrieve a specified job
  • Note that the creation of a new job has moved from JobsApiController to PipelinesApiController, which checks which pipeline is requested (currently the only possible one is imputation) and then calls the appropriate service (ImputationService) to create the job.

Database changes:

  • The TSPS database's jobs table has been renamed to imputation_jobs (and future pipelines will get their own tables), and fields in this database that are captured in Stairway's database are removed (timestamps, status, and pipelineId). Currently, imputation_jobs stores the internal id, the jobId for the flight, the pipeline version, and the userId of the submitting user.
  • Note that the pipeline_inputs table remains as-is and is not imputation-specific. We may make the inputs table pipeline-specific in future but this was not needed in this PR.

Stairway-related changes:

  • Stairway classes refactored e.g. StairwayJobService and StairwayJobBuilder are now JobService and JobBuilder
  • Stairway JobBuilder now requires userId and pipelineId for all flights. It also requires a jobId (flightId) input, though the work to make the user supply this rather than the service is still to be completed in TSPS-136.
  • Stairway JobService now performs job lookups, both for getting all jobs and for getting specific jobs. It always checks that the jobs returned were submitted by the requesting user.
  • Add JobApiUtils class that contains useful methods for converting Stairway job objects (EnumeratedJobs, FlightStates, etc) to API response objects. Note that the mapFlightStatusToApi method groups the finer-grained Stairway statuses into three user-facing statuses for a job: RUNNING, SUCCEEDED, FAILED.

Other:

  • Create PipelinesEnum to encompass validated pipelines. Note that there is a test in PipelinesServiceTest to ensure that every pipeline in PipelinesEnum has a corresponding row in the pipelines repository pipelines table.

Jira Ticket

https://broadworkbench.atlassian.net/browse/TSPS-123

Copy link
Collaborator

@jsotobroad jsotobroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were there any tests that you werent able to recapitulate with all of the refactoring you did or any parts of the code you wanted to test that felt overly difficult to test? The coverage looks great but was just curious.

$ref: '#/components/responses/ServerError'
post:
summary: Create new job request
summary: Create new pipeline job
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to make sure im on the right track, eventually this will require a jobid to be passed in, in which case we will also need to add a 202 reponse code when we do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's right - TSPS-136 covers requiring the jobid in the payload here. good catch about the 202 - that ought to be part of this PR, will add it.

$ref: '#/components/responses/PermissionDenied'
'404':
404:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldnt need 403 or 404 here if we're just returning the list of jobs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thank you!

@@ -58,4 +97,82 @@ static ApiPipeline pipelineToApi(Pipeline pipelineInfo) {
.displayName(pipelineInfo.getDisplayName())
.description(pipelineInfo.getDescription());
}

// Pipelines jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth expanding on this comment since its probably the most integral endpoint we have

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added descriptions to each controller method below - is this what you had in mind, or is there still missing context?

ApiGetJobResponse result = jobToApi(job);

logger.info("Retrieving jobId {} for userId {}", jobId, userId);
FlightState flightState = stairwayJobService.retrieveJob(jobId, userId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how are we making sure that we are correctly handling 403s (dont have permission to see job id being asked for) and 404s (job id doesnt exist)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed in Slack - had been overly cautious with returning 404s instead of 403s, but this isn't necessary. should be returning 403s properly now.

@@ -19,9 +22,4 @@ public enum StairwayJobMapKeys {
public String getKeyName() {
return keyName;
}

public static boolean isRequiredKey(String keyName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something that isnt useful to have? I can imagine a place where we want to get a list of required keys. Was it not being used before?

Copy link
Collaborator Author

@mmorgantaylor mmorgantaylor Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, agreed, but the place I wanted to use it (when setting up a new flight) is written in a way that wasn't obvious how to use this. (it was not being used before; it existed in WSM but isn't used in that repo.) but I'm going to take another look, because this is a much clearer way to specify the required keys than what we have.

FlightState result = stairwayComponent.get().getFlightState(jobId.toString());
// Note: after implementing TSPS-134, we can filter by flightId in enumerateJobs and remove
// the following check
if (!userId.equals(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so i wonder if this scenario shoudl return a job not found or a "forbidden" exception. Also at some point we will be doing these checks at the workspace level as opposed to the user level. Is that something we want to do now or do it in another round of refactors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: checks at workspace level, I'd say make that change in the future if we do it, since that's only applicable if users are submitting data from a workspace, versus uploading the data in the no-BP model. or am i missing something?

- column:
name: time_completed
- column:
name: status
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! we can clean up the pipelines table here if we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call, done!

@mmorgantaylor mmorgantaylor marked this pull request as ready for review January 23, 2024 14:24
"Missing required field for flight construction: jobId");
}

for (JobMapKeys key : JobMapKeys.values()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a bit more efficient if we change isRequiredKey to be collection of values. Then we can loop through that collection of values and check against jobParameterMap to see if they exist.

Also usually with validations like this you'd want to gather all failures at the same time as opposed to returning the first failure you see. I think this is pretty much a check to be internally consistent so probably not a big deal but when we start validing user inputs and stuff we'll want to do that instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great points, will do


// only used in tests
@VisibleForTesting
public ImputationJob getImputationJob(UUID jobId, String userId) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this function and the one below it if theyre only used for testing? cant we just call the imputationJobsRepository in the tests directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 sure can, thanks!

* following related classes:
* @see ImputationJob
*/
@Transactional
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be transactional. I'm assuming this is just doing one write to the stairway database

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm at this time that's true. in future we might be doing more that might merit its being transactional, but happy to take this out for now

Copy link

@mmorgantaylor mmorgantaylor merged commit 1e55128 into main Jan 23, 2024
13 checks passed
@mmorgantaylor mmorgantaylor deleted the TSPS-123_mma_async_api_best_practices branch January 23, 2024 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants