-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Letting each condition control its frequency #150
Letting each condition control its frequency #150
Conversation
Using throttling, each condition was being checked once per minute. But a condition may need to perform its check at a different pace like for example the TaskMonitor condition. This change applies throttling at individual condition instead of applying it generically at the subprocess level. The TaskMonitor condition will not be executed once per 10 seconds instead of one minute. This will allow for faster evaluation of worker idleness which would help in scaling down the worker fleet a lot more effectively.
The task monitor needs to process the following 4 MWAA signals for the graceful update project: 1. Termination signal: Graceful termination of the workers when the environment is going through a graceful update 2. Resume signal: Reverting the state of graceful termination and resume work when the environment is going through a rollback after attempting a graceful update 3. Kill signal: Shutting down the worker without waiting for the current Airflow tasks to finish when the environment is going through a forced update 4. Activation signal: Starting consumption of work from the queue after termination protection has enabled on the corresponding Fargate task The processing is gated behind certain environment variables which are either absent or marked as false for an environment which does not have graceful updates enabled. The CR brings the changes that have been merged for 2.9.2 (aws#137 and aws#150) to the newer version.
Pause task consumption in the close method helps reduce the probability of worker picking up work during the shutdown procedure. Plus, AIRFLOW__CELERY__WORKER_AUTOSCALE value to "80,80" was done when 2xlarge environment class was introduced. This change got left out when porting improved autoscaling changes to 2.9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes also need to be ported to 2.10.1.
I believe I have covered these in #151. Is there anything you saw that is missing in that PR? |
I didn't check the other PRs to see if the changes were ported there. Nothing seems to be missing. LGTM. |
) The task monitor needs to process the following 4 MWAA signals for the graceful update project: 1. Termination signal: Graceful termination of the workers when the environment is going through a graceful update 2. Resume signal: Reverting the state of graceful termination and resume work when the environment is going through a rollback after attempting a graceful update 3. Kill signal: Shutting down the worker without waiting for the current Airflow tasks to finish when the environment is going through a forced update 4. Activation signal: Starting consumption of work from the queue after termination protection has enabled on the corresponding Fargate task The processing is gated behind certain environment variables which are either absent or marked as false for an environment which does not have graceful updates enabled. The CR brings the changes that have been merged for 2.9.2 (#137 and #150) to the newer version. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Using throttling, each condition was being checked once per minute. But a condition may need to perform its check at a different pace like for example the TaskMonitor condition. This change applies throttling at individual condition instead of applying it generically at the subprocess level. The TaskMonitor condition will not be executed once per 10 seconds instead of one minute. This will allow for faster evaluation of worker idleness which would help in scaling down the worker fleet a lot more effectively.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.