Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskStatusDB.check_out_task: Use subquery for taskid instead of separate select #30

Merged
merged 3 commits into from
Aug 28, 2023

Conversation

dwhswenson
Copy link
Member

A few consequences here:

  • We need to use RETURNING to get taskid: apparently with a subquery, UPDATE does not return its rows. This may limit our DB backends. (But I think we were already using RETURNING elsewhere, so that limitation was already in place. This just entrenches it further.)
  • We lean heavily into trusting the DB stuff to go well. Previously, there were 3 result classes for checking out a task: (1) a taskid, (2) an indicator that no tasks are available, (3) an indicator that something went wrong in the task selection process. Now results (2) and (3) are grouped together, and can't be distinguished.
  • We can no longer log the pre-checkout status of the task (we still log the post-checkout at debug level)

SQLA engine logging from checking out a task:

>>> t1 = db.check_out_task()
INFO:sqlalchemy.engine.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.Engine:UPDATE tasks SET status=?, last_modified=?, tries=(tasks.tries + ?) WHERE tasks.taskid = (SELECT tasks.taskid 
FROM tasks 
WHERE tasks.status = ?
 LIMIT ? OFFSET ?) AND tasks.status = ? RETURNING taskid
INFO:sqlalchemy.engine.Engine:[generated in 0.00048s] (2, '2023-08-28 12:58:01.823308', 1, 1, 1, 0, 1)
INFO:sqlalchemy.engine.Engine:COMMIT

Looks like we redundantly ask for the given old status, but that probably doesn't hurt anything. (Also, seeing the non-informative logging, I'm leaning more toward the change proposed in #24.

A few consequences here:

* We need to use RETURNING to get taskid: apparently with a subquery,
  UPDATE does not return its rows. This may limit our DB backends. (But
  I think we were already using RETURNING elsewhere, so that limitation
  was already in place. This just entrenches it further.)
* We lean heavily into trusting the DB stuff to go well. Previously,
  there were 3 result classes for checking out a task: (1) a taskid, (2)
  an indicator that no tasks are available, (3) an indicator that
  something went wrong in the task selection process. Now results (2)
  and (3) are grouped together, and can't be distinguished.
@codecov
Copy link

codecov bot commented Aug 28, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.04% ⚠️

Comparison is base (4be25ce) 91.47% compared to head (c0722a5) 91.44%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #30      +/-   ##
==========================================
- Coverage   91.47%   91.44%   -0.04%     
==========================================
  Files           7        7              
  Lines         528      526       -2     
==========================================
- Hits          483      481       -2     
  Misses         45       45              
Files Changed Coverage Δ
exorcist/taskdb.py 97.48% <100.00%> (-0.04%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

probably no longer needed
Copy link
Contributor

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. I'm a little confused that we can't(?) cleanly separate a 0 length return (none available) from an error?

@dwhswenson
Copy link
Member Author

I'm a little confused that we can't(?) cleanly separate a 0 length return (none available) from an error?

To clarify: now we need to see an actual error to know it was an error. Logically, we should be okay, but if there's some sort of silent error (like, if some sort of database corruption meant that there was no taskid returned, even though there should have been a task that was available), then it will return 0 rows. That's the same thing that happens if, in the normal course of operation, there are no tasks available. (For example, because you're waiting on other tasks to unblock.)

The result we get if an update affects no rows is the same, regardless of why it affected no rows (normal operations or something unplanned for in the DB that isn't caught as an actual SQL error). So we can't distinguish these things.

Another way of saying it: the retry logic I was implementing in #29 is completely useless now. There is no way to raise the exception that indicates that a retry is a reasonable choice.

@dwhswenson dwhswenson merged commit c391a8e into main Aug 28, 2023
5 checks passed
@dwhswenson dwhswenson deleted the new_update branch August 28, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants