User Actions: Restart workflow #53

DonHaul · 2024-07-10T11:19:22Z

User Actions added:

restart
restart current
restart with params

Missing User Action:

skip to step (by default next)

/api/workflows/authors/<id>/restart it might change but for now it receives:

{
  "restart_current_task" : true # if not specified defaults to false
  "params" : {dictionary with new params} # this was requested in user action buts its not yet specified waht this params are, if not specified, a full restart is done
}

also it can simply not receive nothing in that case a full restart occurs

drjova

Thank you , I have added few comments

drjova · 2024-07-17T09:03:01Z

backoffice/workflows/api/views.py

+    @action(detail=True, methods=["post"])
+    def restart(self, request, pk=None):
+
+        params = request.data.get("params", None)


.get by default returns None, no need to specify it.

drjova · 2024-07-17T09:03:29Z

backoffice/workflows/api/views.py

+        data = {"dry_run": False, "dag_run_id": pk, "reset_dag_runs": True}
+
+        executed_dags_for_workflow = {}
+        # find dags that were executed


we don't need the comments :)

removed. why not?

drjova · 2024-07-17T09:04:58Z

backoffice/workflows/api/views.py

+        for dag_id in AUTHOR_DAGS[workflow.workflow_type]:
+            response = requests.get(
+                f"{airflow_utils.AIRFLOW_BASE_URL}/api/v1/dags/{dag_id}/dagRuns/{pk}",
+                json=data,
+                headers=airflow_utils.AIRFLOW_HEADERS,
+            )
+            if response.status_code == status.HTTP_200_OK:
+                executed_dags_for_workflow[dag_id] = response.content


why we need to iterate through all the dags? for example this code will run both approve and reject DAGS

For that case it will simply check which one of them got executed and restart it. (if the response is 200 OK it means it got executed - regardless of finishing successfully on)

In general this code serves to fetch which dags of a given workflow have run.
For example there can be cases that, for a given workflow, only the author_create_initialization_dag dag ran and no approve nor reject got triggered . in this case it will only restart this 1 dag

drjova · 2024-07-17T09:07:40Z

backoffice/workflows/api/views.py

+                headers=airflow_utils.AIRFLOW_HEADERS,
+            )
+            if response.status_code != 200:
+                return Response({"error": "Failed to restart task"}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)


I would not put in every error 500, as we have already API exceptions I will create one for failed to restart task for example, with a different error code. 500 is usually for unexpected server error, in these cases we know what's going on.

drjova · 2024-07-17T09:08:40Z

backoffice/workflows/api/views.py

+                json=data,
+                headers=airflow_utils.AIRFLOW_HEADERS,
+            )
+            if response.status_code != 200:


we can just raise for status and catch here

drjova · 2024-07-17T09:08:48Z

backoffice/workflows/api/views.py

+
+                return airflow_utils.trigger_airflow_dag(WORKFLOW_DAG[workflow.workflow_type], pk, params)
+
+        return Response({"error": "Failed to restart"}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)


we don't need this we can try catch the code above

drjova · 2024-07-17T09:08:58Z

backoffice/workflows/api/views.py

+
+            return Response(response.json(), status=status.HTTP_200_OK)
+
+        else:


we don't need else

drjova · 2024-07-17T09:10:00Z

backoffice/workflows/tests/test_views.py

+def patch_requests():
+    with patch("requests.post") as mock_post, patch("requests.get") as mock_get, patch(
+        "requests.delete"
+    ) as mock_delete:
+
+        # Configure the mock for requests.post
+        mock_post.return_value.status_code = 200
+        mock_post.return_value.json.return_value = {"key": "value"}
+
+        # Configure the mock for requests.get
+        mock_get.return_value.status_code = 200
+        mock_get.return_value.json.return_value = {"data": "some_data"}
+
+        # Configure the mock for requests.delete
+        mock_delete.return_value.status_code = 204
+
+        yield mock_post, mock_get, mock_delete


we can use pytest-vcr for that

Example: https://github.com/SCOAP3/scoap3/blob/27dfb1ed1c3c08dd9339c8ec8fd108ca431e1845/scoap3/articles/tests/test_article_compliance.py#L24

drjova · 2024-07-26T09:56:30Z

...backoffice/workflows/tests/cassettes/TestAuthorWorkflowViewSet.test_restart_with_params.yaml

+      - application/json
+      User-Agent:
+      - python-requests/2.31.0
+    method: POST


https://github.com/inspirehep/inspirehep/blob/master/backend/tests/conftest.py#L11-L36

we need to have similar config not to record internal services, such as opensearch

now that I think about it backoffice tests are currently only accessing internal services since the workflows are now also contained in this repo. removing pytest-vcr for now

the workflows despite it's in the same repo it's not the same service and for the moment we keep them separate

drjova

Thank you, few comments

drjova · 2024-07-26T12:55:09Z

backoffice/backoffice/workflows/constants.py

+
+
+# author dags for each workflow type
+AUTHOR_DAGS = {


aren't we having this already in line 33?

drjova · 2024-07-26T12:56:34Z

workflows/logs/scheduler/latest

actually we shouldn't include the logs folder, could you please delete it and add it to .gitignore?

drjova · 2024-07-26T12:58:04Z

backoffice/backoffice/workflows/urls.py

+from . import views
+
+urlpatterns = [
+    path(


why do we have this here? We already have the definitions here

backoffice/backoffice/config/api_router.py

Lines 22 to 23 in b9562c6

router.register("workflows", WorkflowViewSet, basename="workflows")

router.register(

indeed it was not doing anything - probably part of an old implementation I had tried.
Removed

drjova · 2024-07-26T12:59:50Z

backoffice/backoffice/workflows/api/views.py

+            if response.status_code == status.HTTP_200_OK:
+                executed_dags_for_workflow[dag_id] = response.content


response.raise_for_status()

I believe this use case is a bit different. If a response is different from 200 OK its not necessarily an error, it simply means the workflow didnt reach that dag execution yet. (e.g. if its in the approval status, the request to check if the accept dag was executed will fail)

drjova · 2024-07-26T13:01:11Z

backoffice/backoffice/workflows/api/views.py

+                executed_dags_for_workflow[dag_id] = response.content
+
+        #  assumes current task is one of the failed tasks
+        if restart_current_task:


here the workflows are already restarted right? so there is not much of a point for restarting one task. Do I miss something? Instead we should either restart all or restart a specific task

in here the worfklows haven not yet been restarted. the section of code before it is just checking what dags ran for this workflow. moving this section 141-150 below to make it more clear

backoffice/backoffice/workflows/api/views.py

drjova · 2024-07-30T05:44:42Z

backoffice/backoffice/workflows/airflow_utils.py

+            headers=AIRFLOW_HEADERS,
+        )
+        response.raise_for_status()
+        return HttpResponse()


why here we are using HttpResponse and in other places JsonResponse?

this request returns 204 NO CONTENT and the response was empty, which was failing with JsonResponse as it requires content.
Alternatively I can readd with JsonResponse with some informative json object

yes a json response with success message

drjova · 2024-07-30T05:45:00Z

backoffice/backoffice/workflows/airflow_utils.py

+    executed_dags_for_workflow = find_executed_dags(workflow)
+
+    for dag_id in executed_dags_for_workflow:
+        #  delete all executions of workflow


we don't need this :)

drjova · 2024-07-30T05:47:22Z

backoffice/backoffice/workflows/airflow_utils.py

+        return JsonResponse(data, status=status.HTTP_424_FAILED_DEPENDENCY)
+
+
+def find_executed_dags(workflow):


do we need to pass around the whole object? we can simple pass the id

we need also the workflow_type, to find which specific dags have been executed. should I just pass id and workflow_type instead of the full workflow object

drjova · 2024-07-30T05:47:27Z

backoffice/backoffice/workflows/airflow_utils.py

+    return executed_dags_for_workflow
+
+
+def find_failed_dag(workflow):


backoffice/backoffice/workflows/airflow_utils.py

drjova · 2024-07-30T05:51:08Z

backoffice/backoffice/workflows/tests/test_airflow_utils.py

+            self.dag_id, str(self.workflow.id)
+        )
+
+    def tearDown(self) -> None:


we don't use types :)

drjova · 2024-07-30T05:51:41Z

backoffice/backoffice/workflows/tests/test_airflow_utils.py

+
+    @pytest.mark.vcr()
+    def test_restart_failed_tasks(self):
+        time.sleep(20)  # wait for dag to fail


do we really need that? It's anyway a recording

it was atleast needed to make the recording, I can remove it.
However if for some reason we need to rerecord. this delays need to be added else the tests will fail

let's remove it, we are adding 40secs extra to the tests and in general it's not recommended to block execution with sleep

drjova · 2024-07-30T05:51:56Z

backoffice/backoffice/workflows/tests/test_airflow_utils.py

+
+    @pytest.mark.vcr()
+    def test_find_failed_dag(self):
+        time.sleep(20)  # wait for dag to fail


backoffice/backoffice/workflows/airflow_utils.py

drjova · 2024-07-30T06:16:28Z

backoffice/backoffice/workflows/airflow_utils.py

+        return HttpResponse(status=status.HTTP_424_FAILED_DEPENDENCY)
+
+
+def restart_workflow_dags(workflow, params=None):


drjova

Few more comments, thank you 🙏

drjova · 2024-07-30T10:36:22Z

backoffice/backoffice/workflows/airflow_utils.py

+            "Clearing Failed Tasks of DAG %s with data: %s and %s %s",
+            dag_id,
+            data,
+            AIRFLOW_HEADERS,


this will expose the token in the logs and it's not recommended

okay they were already being exposed in other logs from a previous PR we had merged so I though it was ok. Removing tokens from those logs as well

drjova · 2024-07-30T10:37:50Z

backoffice/backoffice/workflows/airflow_utils.py

+        logger.info(
+            "Deketing dag Failed Tasks of DAG %s with no data and %s %s",
+            dag_id,
+            AIRFLOW_HEADERS,


drjova · 2024-07-30T10:38:41Z

backoffice/backoffice/workflows/airflow_utils.py

+            headers=AIRFLOW_HEADERS,
+        )
+        response.raise_for_status()
+        return HttpResponse()


yes a json response with success message

drjova · 2024-07-30T10:42:12Z

backoffice/backoffice/workflows/tests/test_airflow_utils.py

+
+    @pytest.mark.vcr()
+    def test_restart_failed_tasks(self):
+        time.sleep(20)  # wait for dag to fail


let's remove it, we are adding 40secs extra to the tests and in general it's not recommended to block execution with sleep

backoffice/backoffice/workflows/constants.py

drjova · 2024-07-30T10:47:34Z

backoffice/backoffice/workflows/tests/cassettes/TestAirflowUtils.test_find_failed_dag.yaml

+      Content-Type:
+      - application/json
+    method: GET
+    uri: http://airflow-webserver:8080/api/v1/dags/author_create_approved_dag/dagRuns/00000000-0000-0000-0000-000000000001


there is something wrong, it returns 404 actually. Could we please double check all the cassettes that we have the correct responses?

Cassetes have been reviewed, minor fixes.
Some of them will indeed have respondes that contain 404, as its the only way I found to see if a given dag was executed or not.

drjova

@DonHaul few minor changes :)

drjova · 2024-07-31T06:08:22Z

backoffice/backoffice/workflows/airflow_utils.py

+def find_failed_dag(workflow_id, workflow_type):
+    """For a given workflow find failed dags.
+
+    :param workflow: workflow to get failed dags


we have to update docstrings :)

drjova · 2024-07-31T06:08:56Z

backoffice/backoffice/workflows/airflow_utils.py

+def restart_workflow_dags(workflow_id, workflow_type, params=None):
+    """Restarts dags of a given workflow.
+
+    :param workflow: workflow whoose dags should be restarted


drjova · 2024-07-31T06:09:04Z

backoffice/backoffice/workflows/airflow_utils.py

+def find_executed_dags(workflow_id, workflow_type):
+    """For a given workflow find dags associated to it.
+
+    :param workflow: workflow to look dags for


drjova · 2024-07-31T06:10:39Z

backoffice/backoffice/workflows/tests/cassettes/TestAirflowUtils.test_delete_workflow_dag.yaml

+      Content-Type:
+      - application/json
+    method: DELETE
+    uri: http://airflow-webserver:8080/api/v1/dags/author_create_initialization_dag/dagRuns/00000000-0000-0000-0000-000000000001


why are we trying to delete twice?

Yes so this by default for every test in the tearDown of every test we are deleting the execution.
Because for the test_delete_workflow_dag test we are specifically testing that the deletion is done correctly, the tearDown wont be able to delete anything in this case

drjova · 2024-07-31T06:13:44Z

backoffice/backoffice/workflows/tests/test_views.py

+    def tearDown(self):
+        super().tearDown()
+        airflow_utils.delete_workflow_dag(
+            WORKFLOW_DAGS[self.workflow.workflow_type].initialize, str(self.workflow.id)


let's be consistent, in some places we do str in some others we don't

drjova · 2024-07-31T06:15:06Z

backoffice/backoffice/workflows/tests/test_views.py

+            status="running",
+            core=True,
+            is_update=False,
+            workflow_type="AUTHOR_CREATE",


please let's use everywhere the constant, to be easy to propagate changes everywhere and to avoid changing strings one-by-one

DonHaul marked this pull request as draft July 10, 2024 11:19

DonHaul force-pushed the restart-workflow branch 7 times, most recently from 76602b2 to b255a7b Compare July 17, 2024 08:24

DonHaul marked this pull request as ready for review July 17, 2024 08:30

DonHaul changed the title ~~Restart workflow~~ User Actions: Restart workflow Jul 17, 2024

DonHaul requested a review from drjova July 17, 2024 08:33

drjova suggested changes Jul 17, 2024

View reviewed changes

DonHaul force-pushed the restart-workflow branch 4 times, most recently from a0d762e to 4424c53 Compare July 26, 2024 09:23

drjova reviewed Jul 26, 2024

View reviewed changes

DonHaul force-pushed the restart-workflow branch 4 times, most recently from 799feae to 9ae09c1 Compare July 26, 2024 12:48

drjova suggested changes Jul 26, 2024

View reviewed changes

DonHaul force-pushed the restart-workflow branch 7 times, most recently from 1d03468 to 7ee5c32 Compare July 29, 2024 14:37

drjova suggested changes Jul 30, 2024

View reviewed changes

DonHaul force-pushed the restart-workflow branch from f7c9b3d to 3a4ce92 Compare July 30, 2024 09:06

drjova suggested changes Jul 30, 2024

View reviewed changes

DonHaul force-pushed the restart-workflow branch from 645026c to a243824 Compare July 30, 2024 14:18

drjova suggested changes Jul 31, 2024

View reviewed changes

user actions: restart actions and respective tests added

93fc158

DonHaul force-pushed the restart-workflow branch from 11f79fa to 93fc158 Compare July 31, 2024 15:17

drjova approved these changes Aug 1, 2024

View reviewed changes

drjova merged commit 5250f56 into inspirehep:main Aug 1, 2024
6 checks passed


		return airflow_utils.trigger_airflow_dag(WORKFLOW_DAG[workflow.workflow_type], pk, params)

		return Response({"error": "Failed to restart"}, status=status.HTTP_500_INTERNAL_SERVER_ERROR)


		return Response(response.json(), status=status.HTTP_200_OK)

		else:

	router.register("workflows", WorkflowViewSet, basename="workflows")
	router.register(

		if response.status_code == status.HTTP_200_OK:
		executed_dags_for_workflow[dag_id] = response.content

		return JsonResponse(data, status=status.HTTP_424_FAILED_DEPENDENCY)


		def find_executed_dags(workflow):

		return executed_dags_for_workflow


		def find_failed_dag(workflow):

		return HttpResponse(status=status.HTTP_424_FAILED_DEPENDENCY)


		def restart_workflow_dags(workflow, params=None):

User Actions: Restart workflow #53

User Actions: Restart workflow #53

Conversation

DonHaul commented Jul 10, 2024 • edited Loading

drjova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DonHaul Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drjova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drjova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drjova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DonHaul commented Jul 10, 2024 •

edited

Loading

DonHaul Jul 26, 2024 •

edited

Loading