You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While considering issue #443, I identified that job cancellations, although a corner case in normal operations with well-intentioned users, also represent a potential Denial of Service (DoS) attack vector and is an actual non-trivial source of wasted GPU cycles. This issue is distinct from the bug identified in #443, which pertains specifically to the submission of completed jobs by workers. To address my other concerns, I propose the following improvements to the handling of canceled jobs within the worker job dispatch system.
Proposed Changes:
Job Cancellation Handling:
Introduce a new field jobs_cancelled in the job pop responses. This field will list job ids that were assigned to the worker but have since been canceled by the requesting user.
New Worker Notification Endpoint:
Create a new POST endpoint for worker notifications:
The endpoint will always respond with the jobs_cancelled field, providing a list of canceled job ids.
It will not assign new jobs to the worker in this response.
The worker can send a payload containing the jobs_cancelled field to acknowledge that they have stopped working on the canceled job(s).
Prorated Kudos for Canceled Jobs:
Implement a prorated kudos system where the amount of kudos awarded decreases based on how much time has elapsed before the worker acknowledges the job cancellation. This incentivizes workers to abandon canceled jobs quickly, thereby saving GPU cycles.
Abuse Prevention Measures:
Recognize the potential for abuse and introduce mechanisms to mitigate it:
Flagging High Cancellation Pairs: Monitor and flag user/worker pairs that have a high frequency of job cancellations for review.
Statistical Anomalies: Identify and flag workers with abnormal or statistically unlikely cancellation rates.
Targeted Cancellations: Pay extra attention to workers who cancel jobs that were specifically targeted to them using the workers field.
Untrusted workers: Workers who are not yet trusted should trigger additional scrutiny when high volumes of cancellations occur for jobs they have been assigned.
The text was updated successfully, but these errors were encountered:
While considering issue #443, I identified that job cancellations, although a corner case in normal operations with well-intentioned users, also represent a potential Denial of Service (DoS) attack vector and is an actual non-trivial source of wasted GPU cycles. This issue is distinct from the bug identified in #443, which pertains specifically to the submission of completed jobs by workers. To address my other concerns, I propose the following improvements to the handling of canceled jobs within the worker job dispatch system.
Proposed Changes:
Job Cancellation Handling:
jobs_cancelled
in the job pop responses. This field will list job ids that were assigned to the worker but have since been canceled by the requesting user.New Worker Notification Endpoint:
POST
endpoint for worker notifications:jobs_cancelled
field, providing a list of canceled job ids.jobs_cancelled
field to acknowledge that they have stopped working on the canceled job(s).Prorated Kudos for Canceled Jobs:
Abuse Prevention Measures:
workers
field.The text was updated successfully, but these errors were encountered: