[16.0] [ENH] queue_job: identity_key enhancements #546

richard-willdooit · 2023-06-16T09:05:16Z

In production, a job which is waiting dependencies or which has started, but not completed, should not be repeated if the identity_key matches.
In tests, the mock queue handler is now enhanced to allow better mimicking of the identity_key blocks from production.
In tests, the mock queue handler now clears the enqueued jobs after performing them, to better reproduce what a production environment would do.

1. In production, a job which is waiting dependencies or which has started, but not completed, should not be repeated if the identity_key matches. 2. In tests, the mock queue handler is now enhanced to allow better mimicking of the identity_key blocks from production. 3. In tests, the mock queue handler now clears the enqueued jobs after performing them, to better reproduce what a production environment would do.

OCA-git-bot · 2023-06-16T09:05:19Z

Hi @guewen,
some modules you are maintaining are being modified, check this out!

guewen

Thanks a lot!

Dependencies and trap_jobs were the last (very) big things I worked on this addon, and I'm thrilled to see they are used and being improved!

guewen · 2023-06-16T09:54:40Z

queue_job/job.py

@@ -300,7 +300,7 @@ def job_record_with_same_identity_key(self):
            .search(
                [
                    ("identity_key", "=", self.identity_key),
-                    ("state", "in", [PENDING, ENQUEUED]),
+                    ("state", "in", [WAIT_DEPENDENCIES, PENDING, ENQUEUED, STARTED]),


Out of curiosity, did you actually use identity keys on a graph of jobs?
When I was working on the dependencies feature, I had a lot of thoughts and no clear outcome of how it should be handled. A job that would be skipped because it already exists in another graph could make the whole (new) graph incoherent, that's why I ended up checking if all the identity keys match.

About the addition of STARTED, I'm uncertain. Depending on the use case, we should or should not include.
As part of my current work, I use Sidekiq daily. They have a similar feature, but you can set a per-job parameter unique_until with options : start (that would mean up to ENQUEUED here) or success (that would be up to STARTED here).

Think about this use case: a job refreshes a cache. Data have changed, we create a pending job. Data change again, no new job because the job is still pending. Job starts. Data change while the job is running. In this very case, we'd like to enqueue a new job otherwise the cache will be outdated.

I reckon that both cases are valid, but I fear adding this state in the domain may, silently and in subtle ways, existing behaviors.

@guewen

No, I did not use it with a graph.

To give it context - a flag switch on res_company for a custom scenario, where a lot of data was involved, caused an extremely long process involving write outs and read ins - so we felt a background job would help the UX. So, it is always one job.

The "fag" to indicate the state will not be available until after the job is complete - so we were using ideniity key to ensure it does not get double run.

If we do not include "started", then a second click and job could be launched while he first one is already executing... and if the jobs ran in parallel, we'd be in strife. So we were hoping identity key would provide that mutual exclusion.

So yes, maybe both cases are valid ... I was going to make a new option for "mutual exclusivity" - or "until finished"... but did not want to change the architecture so much. But we really need started to be included in our checks, because enqueued is likely to only be for a few seconds, and running is likely to be a very very long time...

In case you want to explore this direction, I do not think it is a large change: we don't need a new argument even in with_delay as all jobs for the same method will have the same option. It could be a new field on queue.job.function with the 2 options, then the job instance can have access to it through self.job_options and adapt the domain accordingly.

Worth to note: the identity key cannot be a 100% safe way to prevent duplicate jobs : if 2 transaction create the same job concurrently before commiting, there will be a duplicate. To prevent this, we would need a postgres unique constraint and I preferred not to, to prevent having transactions rollbacked because of this.
So it's better if anyway jobs are idempotent and verify during execution if they have still have to do something (can be coupled with a lock or advisory lock).

Yes, I see where you are coming from.

richard-willdooit · 2023-06-16T22:12:58Z

@guewen

Another option I was thinking about, was the option to, for certain jobs, not allow them to be requeued from the queue.

In our scenario, we kicked off the job from a button. But if the job fails, and then the data changes in such a way that the button is no longer applicable, then the job has lost applicability, too - and should not be requeued - I have done the right thing and made sure our method is autonomous in its own right, and checks that the method is still applicable, but I did wonder if there was a place for special jobs which should be tried once, and once only, and could not be requeued even if they failed...

Ultimately, the main changes I wanted were to introduce tests in our module that replicated the behaviour of our process not allowing the double queuing of the process.

Oh to give even more context to how we have used it:

If I call write to 2 res company records with 4 values, then 3 values are written to both records, and 2 jobs are enqueued to update the other value on the 2 records. It works well, but tests were quite important to show that the behaviour of this was as expected, and then I decided to extend to count the jobs queued to show that double queuing wold not happen for the same record - which it did not in production, but the mock test object incorrectly indicated it had.

richard-willdooit · 2023-06-16T22:14:34Z

@guewen

If you would like me to remove the states from the PR, and just push the test changes, I can do so - or you can modify my PR as you see fit. We would likely then work from my fork for our environment. Let me know.

guewen · 2023-06-19T05:40:27Z

@richard-willdooit

If you would like me to remove the states from the PR, and just push the test changes, I can do so - or you can modify my PR as you see fit. We would likely then work from my fork for our environment. Let me know.

I think you may remove the STARTED state, the WAIT_DEPENDENCIES one seems good to me. So we can merge this first part. Then up to you if you want to revise the STARTED use case in a new PR.

simahawk · 2023-07-07T07:07:00Z

@richard-willdooit thanks for this nice work :)
I agree w/ @guewen on removing the STARTED state check for now.
Then we can merge.... I'm eager to test it on v14 😉

simahawk · 2023-07-07T07:21:11Z

@richard-willdooit one more thing: when you get back to this, please remove the odoo version from the commit. Is not needed and it makes little sense when we fwd/bkport changes. Thanks!

simahawk · 2023-10-09T07:36:53Z

@richard-willdooit gentle ping: any plan to move this fwd?

simahawk · 2023-11-22T14:56:02Z

As per #581 I'm taking over on #587

guewen reviewed Jun 16, 2023

View reviewed changes

guewen mentioned this pull request Jul 6, 2023

[14.0] identity_exact not working w/ chained jobs? #552

Closed

simahawk mentioned this pull request Jul 27, 2023

[14.0] edi: generate/send chain jobs + add identity exact match based on checksum OCA/edi#796

Merged

guewen mentioned this pull request Nov 14, 2023

[14.0] [ENH] queue_job: identity_key enhancements #544

Closed

cyrilmanuel approved these changes Nov 16, 2023

View reviewed changes

This was referenced Nov 21, 2023

job_record_with_same_identity_key: make states configurable per job #582

Closed

[16.0][IMP] queue_job: identity_key enhancements #587

Merged

simahawk closed this Nov 22, 2023

guewen mentioned this pull request Nov 28, 2024

Identity key behavior with started jobs #712

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[16.0] [ENH] queue_job: identity_key enhancements #546

[16.0] [ENH] queue_job: identity_key enhancements #546

richard-willdooit commented Jun 16, 2023

OCA-git-bot commented Jun 16, 2023

guewen left a comment

guewen Jun 16, 2023

richard-willdooit Jun 16, 2023

guewen Jun 17, 2023

guewen Jun 17, 2023

richard-willdooit Jun 19, 2023

richard-willdooit commented Jun 16, 2023

richard-willdooit commented Jun 16, 2023

guewen commented Jun 19, 2023

simahawk commented Jul 7, 2023

simahawk commented Jul 7, 2023

simahawk commented Oct 9, 2023

simahawk commented Nov 22, 2023

[16.0] [ENH] queue_job: identity_key enhancements #546

[16.0] [ENH] queue_job: identity_key enhancements #546

Conversation

richard-willdooit commented Jun 16, 2023

OCA-git-bot commented Jun 16, 2023

guewen left a comment

Choose a reason for hiding this comment

guewen Jun 16, 2023

Choose a reason for hiding this comment

richard-willdooit Jun 16, 2023

Choose a reason for hiding this comment

guewen Jun 17, 2023

Choose a reason for hiding this comment

guewen Jun 17, 2023

Choose a reason for hiding this comment

richard-willdooit Jun 19, 2023

Choose a reason for hiding this comment

richard-willdooit commented Jun 16, 2023

richard-willdooit commented Jun 16, 2023

guewen commented Jun 19, 2023

simahawk commented Jul 7, 2023

simahawk commented Jul 7, 2023

simahawk commented Oct 9, 2023

simahawk commented Nov 22, 2023