Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include Jitter with Timer Config and Execution #835

Open
tsilvers-ms opened this issue Mar 10, 2023 · 0 comments
Open

Include Jitter with Timer Config and Execution #835

tsilvers-ms opened this issue Mar 10, 2023 · 0 comments

Comments

@tsilvers-ms
Copy link

tsilvers-ms commented Mar 10, 2023

Widely adopted Azure Functions (Ex StartStop VM V2), which are set to run on a timer are being deployed to customer subscriptions with the default timer values. When a large enough number of customers have adopted the function and they execute at the same time based on the default value for the timer it is enough traffic to overwhelm services being called by that function.

We need a way via timer configuration to add a jitter value which would allow for smoothing the execution time of functions with the same timer settings preventing large call spikes to downstream services.

Repro steps

Provide the steps required to reproduce the problem

  1. Have a function app scheduled to execute at the same time across a large number of customer subscriptions say 0 0 0 * * *

  2. The function execution happens simultaneously across all customer subscriptions at the same time of midnight (in case of 0 0 0 * * *)

Expected behavior

With a jitter value added to the timer config, the calls would be semi-randomly spread throughout the jitter window across all subscriptions and functions. This would cause the calls to downstream services to be spread out and better able to handle load as adoption increases.

Actual behavior

Execution happens simultaneously across all customer subscriptions at the same time of midnight (in case of 0 0 0 * * *) causing brown out or black out of downstream services.

Known workarounds

  • Take the machine name which should be unique at execution time
  • Hash the machine name to get an int value
  • Set execution minute for daily as %1440 and 6hrs as %360
  • Now with the function checking every minute we simply take the current minute of the day (0-1440) and compare with execution minute. If they are equal execute otherwise no-op.

This algorithm should be consistent as the function machine names when run every minute should be consistent and rarely change. It will also spread the load throughout the day. To further help we can add small random jitter via a Sleep to spread the load throughout the minute as well.

Related information

N/A

  • Package version
  • Links to source
@ghost ghost added the Needs: Triage 🔍 label Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant