Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ chore: remove group from task args and move task user to datashare app #1614

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ClemDoum
Copy link
Contributor

@ClemDoum ClemDoum commented Nov 26, 2024

⚠️ Requires Redis migrations (I'll do it)

Bug description

This PR aims at addressing some of the serialization issues/descrepencies between datashare.asynctasks and icij-worker.
More precisely, datashare.asynctasks systematically includes the task User and Group inside the Task.args.

However inside icij-worker all task arguments are forwared to task functions, ending up in python TypeError because of extra arguments (namely user and group).

The bug hides the 2 following issues.

datashare-app User concept leakage

datashare.asynctasks APIs use the User type which is a datashare-app concept. While this concept is used in the Datashare app, applications using a TM + workers won't necessarily use a User (which is a DS specific concept). In particular icij-worker as no knowledge of such concept, because most Python TM + worker applications won't use it. Basically, the User concept is leaking from datashare-app to all applications (including non Java application) following the datashare.asynctasks / icij-worker API.

Leaking the internal Group concept to task implementers

datashare.asynctasks serializes the group inside task arguments, which implies that task functions must be able to have a group argument as input. However task functions dont need the group argument to be performed. The Group is a datashare.asynctasks / icij-worker "internal". Task implementers shouldn't need to include the group argument inside their task function to make them work.

Fix

Chosen fixes

I chose to remove the User concept from datashare.asynctask and move it inside datashare-app, where task are created with a user argument, this only task performs for datashare will need to feature this argument. datashare-app uses a DatashareTaskManager extends TaskManager which includes all User related API (which were removed from TaskManager). Additionnaly, the DatashareTask helper class was implemented to create task with a user arg. I choose no to create a DatashareTask class inheriting Task because this breaks the task serialization ("@type": "DatashareTask").

I chose to align datashare.asynctasks behavior on the icij-worker one. Basically, when we create a task for the first time, we can assign a group to this task, hence, the TM has knowledge of the task group. The TM can keep track of task group rather than adding it to the task arguments.

Discarded fixes

A first solution I thought about was to implement Python fix to this issue by systematically removing the user and group arguments from the args received from the broker.
However I quickly discard this options because it basically "reserved" the group and user argument names for task implementers. These names are quite commons, and if we systematically remove them, it means that task implementers can't use them. An alternative solution whould have been to explore the Python task arguments and filter out all misnamed arguments, but I don't really like this approach as it can lead to silent failures.

Changes

⚠️ datashare-tasks

Changed

  • updated the TaskManager API to keep track of the Group. Rather than always calling save to create and then update tasks, we now use saveMetadata to persist a task and it group (and potentially other persistent metadata) for the first time, then we call persisUpdate(Task<V> task) to update the task itself, other persistent metadata remain untouched (basically once a task is saved with a group information, there's no reason to update it). This behavior is similar to the icij-worker behavior, where the group information is not persisted in task themselve, be it's the TM's responsibility to keep track of it.

Removed

  • remove all User related APIs

⚠️ datashare-app

Changed

  • created a DatashareTaskManager and DatashareTask to implement user related task APIs

@ClemDoum ClemDoum force-pushed the chore/remove-user-and-group-from-task-args branch 2 times, most recently from bb4f471 to 7585833 Compare November 27, 2024 12:01
@ClemDoum ClemDoum changed the title chore: remove user and group from task args chore: remove group from task args and move task user to datashare app Nov 27, 2024
@ClemDoum ClemDoum force-pushed the chore/remove-user-and-group-from-task-args branch from 7585833 to e5a3b89 Compare November 27, 2024 12:08
@ClemDoum ClemDoum marked this pull request as ready for review November 29, 2024 12:09
@ClemDoum ClemDoum changed the title chore: remove group from task args and move task user to datashare app ⚠️ chore: remove group from task args and move task user to datashare app Nov 29, 2024
@ClemDoum ClemDoum self-assigned this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant