Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include queue name in unique task ID generation #181

Open
wants to merge 3 commits into
base: task-unique-key
Choose a base branch
from

Conversation

thomasst
Copy link
Member

@thomasst thomasst commented Mar 21, 2021

This avoids a "task not found" scenario where we queue a unique task into several queues.

For example, if we queue a unique task with ID X into queues A and B, and the task X finishes on queue A first before being processed by queue B, then the task ID X would still be queued in queue B, but the worker would not be able to find the task since the task key is deleted.

We check (

tasktiger/tasktiger/task.py

Lines 262 to 278 in f48fe03

if self.unique:
# Only delete if it's not in any other queue
check_states = {ACTIVE, QUEUED, ERROR, SCHEDULED}
check_states.remove(from_state)
# TODO: Do the following two in one call.
scripts.delete_if_not_in_zsets(
_key('task', self.id, 'executions'),
self.id,
[_key(state, queue) for state in check_states],
client=pipeline,
)
scripts.delete_if_not_in_zsets(
_key('task', self.id),
self.id,
[_key(state, queue) for state in check_states],
client=pipeline,
)
) whether the task is in a different state (so you can e.g. requeue/schedule a unique task into the same queue without losing it), but it would not be reasonable to check all other queues.

I believe this shouldn't be a completely incompatible change to roll out since uniqueness isn't the same as locking as you can still queue a unique task even if one is being executed. Its main purpose is to reduce load. TaskTiger doesn't make any guarantees that unique tasks are not executed concurrently, however, in its current implementation it doesn't do that and instead schedules the unique task for later execution if one is already running (

# Move an item to the active queue, if available.
# We need to be careful when moving unique tasks: We currently don't
# support concurrent processing of multiple unique tasks. If the task
# is already in the ACTIVE queue, we need to execute the queued task
# later, i.e. move it to the SCHEDULED queue (prefer the earliest
# time if it's already scheduled). We want to make sure that the last
# queued instance of the task always gets executed no earlier than it
# was queued.
). Still, we'd have to be careful with deploying this change.

This avoids a "task not found" scenario where we queue a unique task into several queues.
@thomasst thomasst self-assigned this Mar 21, 2021
@thomasst thomasst force-pushed the task-unique-by-queue branch from 1cff32c to c7d5897 Compare March 21, 2021 21:02
@thomasst thomasst changed the title Include queue name in task ID generation Include queue name in unique task ID generation Mar 21, 2021
'kwargs': kwargs,
}
if queue is not None:
data['queue'] = queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to serialize an explicit null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently use the same function for to compute the lock ID (in which case we pass None for the queue). It would break existing locks if we serialized an explicit null. I can add a comment for this.

Copy link
Member

@jkemp101 jkemp101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the unique docs might already suggest this behavior. Should we clarify that the lock is across all queues https://github.com/closeio/tasktiger#task-options.

@thomasst
Copy link
Member Author

Holding off with this PR for now since it can cause issues with existing deployments: If users rely on canceling scheduled tasks by ID, this will break.

@thomasst thomasst mentioned this pull request Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants