You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to monitor Airflow for a problem with DAG execution.
For example:
alert: for the interval "1 day" from the current moment, there was no successful completion of this DAG (group by dag_id)
alert: DAG completion status is different from success (group by dag_id)
I'm trying to write alerts based on the metrics of this exporter, specifically airflow_dag_status, but I haven't been able to figure out how to do it yet.
In our projects, we do not re-run a failed task or a failed DAG, because it is enough for us that the next time the DAG is executed, the problem will go away.
Therefore, in the case of our projects, there will be no decrement in the airflow_dag_status metric, for example, for the failed status
I understand how to write such alerts for a metric that returns either 0 (task or DAG was not in this status), or 1 (task or DAG was in this status), but I don't understand how I can use a metric that counts for a specific task or DAG the total number of times in a given status for the entire time
I think that in the case of Airflow, the most important thing is not how many times a particular DAG was in a particular state, but what state a particular DAG is in now
Could you give recommendations on how to write alerts from the examples above, based on the metrics of this exporter (if at all possible)?
The text was updated successfully, but these errors were encountered:
Hello!
I want to monitor Airflow for a problem with DAG execution.
For example:
I'm trying to write alerts based on the metrics of this exporter, specifically airflow_dag_status, but I haven't been able to figure out how to do it yet.
In our projects, we do not re-run a failed task or a failed DAG, because it is enough for us that the next time the DAG is executed, the problem will go away.
Therefore, in the case of our projects, there will be no decrement in the airflow_dag_status metric, for example, for the failed status
I understand how to write such alerts for a metric that returns either 0 (task or DAG was not in this status), or 1 (task or DAG was in this status), but I don't understand how I can use a metric that counts for a specific task or DAG the total number of times in a given status for the entire time
I think that in the case of Airflow, the most important thing is not how many times a particular DAG was in a particular state, but what state a particular DAG is in now
Could you give recommendations on how to write alerts from the examples above, based on the metrics of this exporter (if at all possible)?
The text was updated successfully, but these errors were encountered: