You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Airflow job bqetl_merino_newtab_extract_to_gcs is scheduled to run every 20 minutes. Runs normally take less than 5 minutes. This job aggregates engagement for recommendations on New Tab, such that Merino can show high engaging items to more people. We would like this engagement data to have a low (~30 minute) delay to provide the most engaging stories on New Tab.
In both runs, all three tasks individually had a short duration, and the long run duration was caused by a delay in queueing up tasks after the first one ended.
This suggests that there was a performance issue that impacted more jobs than just the above one. Airflow's scheduler might have had some performance issues, or might be hitting a limit?
Context
The Airflow job bqetl_merino_newtab_extract_to_gcs is scheduled to run every 20 minutes. Runs normally take less than 5 minutes. This job aggregates engagement for recommendations on New Tab, such that Merino can show high engaging items to more people. We would like this engagement data to have a low (~30 minute) delay to provide the most engaging stories on New Tab.
Issue
Over the last two days, duration has spiked at:
In both runs, all three tasks individually had a short duration, and the long run duration was caused by a delay in queueing up tasks after the first one ended.
!image-20240913-175622.png|width=100%,alt="image-20240913-175622.png"!
Initial investigation
Cluster activity on Sept 12th from 3:30am - 4:30am UTC shows only 22 tasks instances, of which 12 were skipped. In contrast, the subsequent 60 minutes had 179 task instances of which 12 were skipped.
This suggests that there was a performance issue that impacted more jobs than just the above one. Airflow's scheduler might have had some performance issues, or might be hitting a limit?
!image-20240913-182332.png|width=100%,alt="image-20240913-182332.png"!
!image-20240913-182359.png|width=100%,alt="image-20240913-182359.png"!
┆Issue is synchronized with this Jira Bug
┆Attachments: image-20240913-175622.png | image-20240913-182332.png | image-20240913-182359.png
The text was updated successfully, but these errors were encountered: