⚠️ chore: remove group from task args and move task user to datashare app #1614
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bug description
This PR aims at addressing some of the serialization issues/descrepencies between
datashare.asynctasks
andicij-worker
.More precisely,
datashare.asynctasks
systematically includes the taskUser
andGroup
inside theTask.args
.However inside
icij-worker
all task arguments are forwared to task functions, ending up in pythonTypeError
because of extra arguments (namelyuser
andgroup
).The bug hides the 2 following issues.
datashare-app
User
concept leakagedatashare.asynctasks
APIs use theUser
type which is adatashare-app
concept. While this concept is used in the Datashare app, applications using a TM + workers won't necessarily use aUser
(which is a DS specific concept). In particularicij-worker
as no knowledge of such concept, because most Python TM + worker applications won't use it. Basically, theUser
concept is leaking fromdatashare-app
to all applications (including non Java application) following thedatashare.asynctasks
/icij-worker
API.Leaking the internal
Group
concept to task implementersdatashare.asynctasks
serializes the group inside task arguments, which implies that task functions must be able to have agroup
argument as input. However task functions dont need thegroup
argument to be performed. The Group is adatashare.asynctasks
/icij-worker
"internal". Task implementers shouldn't need to include thegroup
argument inside their task function to make them work.Fix
Chosen fixes
I chose to remove the
User
concept fromdatashare.asynctask
and move it insidedatashare-app
, where task are created with auser
argument, this only task performs for datashare will need to feature this argument.datashare-app
uses aDatashareTaskManager
extendsTaskManager
which includes all User related API (which were removed fromTaskManager
). Additionnaly, theDatashareTask
helper class was implemented to create task with auser
arg. I choose no to create aDatashareTask
class inheritingTask
because this breaks the task serialization ("@type": "DatashareTask"
).I chose to align
datashare.asynctasks
behavior on theicij-worker
one. Basically, when we create a task for the first time, we can assign a group to this task, hence, the TM has knowledge of the task group. The TM can keep track of task group rather than adding it to the task arguments.Discarded fixes
A first solution I thought about was to implement Python fix to this issue by systematically removing the
user
andgroup
arguments from the args received from the broker.However I quickly discard this options because it basically "reserved" the
group
anduser
argument names for task implementers. These names are quite commons, and if we systematically remove them, it means that task implementers can't use them. An alternative solution whould have been to explore the Python task arguments and filter out all misnamed arguments, but I don't really like this approach as it can lead to silent failures.Changes
datashare-tasks
Changed
TaskManager
API to keep track of theGroup
. Rather than always callingsave
to create and then update tasks, we now usesaveMetadata
to persist a task and it group (and potentially other persistent metadata) for the first time, then we callpersisUpdate(Task<V> task)
to update the task itself, other persistent metadata remain untouched (basically once a task is saved with a group information, there's no reason to update it). This behavior is similar to theicij-worker
behavior, where the group information is not persisted in task themselve, be it's the TM's responsibility to keep track of it.Removed
User
related APIsdatashare-app
Changed
DatashareTaskManager
andDatashareTask
to implement user related task APIs