-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impending DST cutover caused jobs to run continuously #74
Comments
@mackensen , similar thing happened here, took down our sites. Running the command you listed now - thanks for sharing that. Everything looking good for you since? |
@discoinfiltrator more or less (so far); noticed a related issue where jobs persisted for deleted sites and I'm clearing those out manually. |
Fascinating! Cavalcade should be using UTC timestamps throughout the entire system, as does WordPress, so unsure why this is occurring. What is your timezone in PHP's settings (i.e. |
It's set to |
Didn't quite work out for me. There were too many duplicate jobs since the time switchover, and:
from In the end, I backed up + truncated |
The problem is that Cavalcade-Runner sets a completed job's next run to nextrun + interval, even if that timestamp is now in the past. The job immediately gets picked up again and run. The DST changeover exposed this behaviour, but it'd also happen if the runner crashed and got restarted after a few hours. I'll submit a pull request on Cavalcade-Runner that seems to fix the problem. |
This is intentional, so that if the Runner crashes (e.g.), the same number of jobs are run as it "catches up". So long as everything is in UTC, there should be no issue with this, as the clock (generally) monotonically increases. |
I've done a bit more investigation and it seems that under heavy load I started up a new WP instance, created a simple hook I called "cavalcade_load" that just slept for a few seconds, and ran six copies of
Stock WordPress refuses to populate the cron table with more than one job per second, no matter how many Next step: install Cavalcade and Cavalcade-Runner, and spawn six copies of
Note there are eight rows where the start time is the same as the start time in another row. This should be impossible - even if we didn't have WP's implementation indirectly promising that only one hook per second can be created, the process that created id 1801 should still have seen the existing hook from id 1800. You can increase the number of collisions by running more simultaneous Here's another interesting excerpt from the table, later on:
The start time should be increasing monotonically the same way that id is (all of these jobs are running on the same server, so they all share the same system clock), but we've got some that start at 6 seconds after in the middle of a set that start 5 seconds after. This might be an issue with the |
Yeah, this might not be directly responsible for the DST bug but it could be a factor. In
I would expect that cavalcade be consistent with default WP behaviour. |
@discoinfiltrator There was a longstanding core bug in that code that will be fixed in WP 5.2, Cavalcade will inherit the bug fix. |
@rmccue , this happened again last night have I have good explanation for the cause and a fix. In the
Note that the resulting time string is the exact same as the input string. This causes the infinite loop where jobs scheduled between 1:00am and 2:00am with an interval of less than 3600s will be rescheduled for the exact time that they initially had. We can fix this by making sure that this calculation is done with UTC in mind.
This will correctly result in a |
Aaand we just got bitten by this a few days ago. Found 16k jobs and counting, with the top dozen recurring jobs not updating the date. |
This is because of how php DateTime works and Cavalcade for converting the dates is using mysql2date which is using DateTime. When quering the job by Cavalcade it is converted by Try this in wp instance that is using Los Angeles time zone
results in:
So when it is queried back in |
Last evening, at exactly 9 PM EDT (2 AM GMT), hourly jobs started running continuously--instead of being scheduled for the next hour. This caused a major load event and we had to stop Cavalcade everywhere. Sample log showing the issue:
This morning, it seems like we can get cavalcade happy by running
wp site list --field=url | xargs -n 1 -I % wp cron event run --due-now --url=%
to bring all jobs into the present. The WordPress site in question uses UTC internally; the host servers (both application and database) are on EDT/EST.The text was updated successfully, but these errors were encountered: