-
Notifications
You must be signed in to change notification settings - Fork 187
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Feature] Refactor and add support for schedule conditions in DAG con…
…figuration: (#320) ### Description This feature introduces a enhancement to DAG scheduling in Airflow, enabling support for dynamic schedules based on dataset conditions. By leveraging dataset filters and logical conditions, users can now create more flexible and precise scheduling rules tailored to their workflows. **Key Features**: - Condition-Based Scheduling: Allows defining schedules using logical conditions between datasets (e.g., ('dataset_1' & 'dataset_2') | 'dataset_3'), enabling workflows to trigger dynamically based on dataset availability. - Dynamic Dataset Processing: Introduced the process_file_with_datasets function to evaluate and process dataset URIs from external files, supporting both simple and condition-based schedules. - Improved Dataset Evaluation: Developed the evaluate_condition_with_datasets function to transform dataset URIs into valid variable names and evaluate logical conditions securely. **Workflow Example**: Given the following condition: ```yaml example_custom_config_condition_dataset_consumer_dag: description: "Example DAG consumer custom config condition datasets" schedule: file: $CONFIG_ROOT_DIR/datasets/example_config_datasets.yml datasets: "((dataset_custom_1 & dataset_custom_2) | dataset_custom_3)" tasks: task_1: operator: airflow.operators.bash_operator.BashOperator bash_command: "echo 'consumer datasets'" ``` ```yaml example_without_custom_config_condition_dataset_consumer_dag: description: "Example DAG consumer custom config condition datasets" schedule: datasets: "((s3://bucket-cjmm/raw/dataset_custom_1 & s3://bucket-cjmm/raw/dataset_custom_2) | s3://bucket-cjmm/raw/dataset_custom_3)" tasks: task_1: operator: airflow.operators.bash_operator.BashOperator bash_command: "echo 'consumer datasets'" ``` ```yaml example_without_custom_config_condition_dataset_consumer_dag: description: "Example DAG consumer custom config condition datasets" schedule: datasets: !or - !and - "s3://bucket-cjmm/raw/dataset_custom_1" - "s3://bucket-cjmm/raw/dataset_custom_2" - "s3://bucket-cjmm/raw/dataset_custom_3" tasks: task_1: operator: airflow.operators.bash_operator.BashOperator bash_command: "echo 'consumer datasets'" ``` The system evaluates the datasets, ensuring valid references, and schedules the DAG dynamically when the condition resolves to True. **Example Use Case**: Consider a data pipeline that processes files only when multiple interdependent datasets are updated. With this feature, users can create dynamic DAG schedules that automatically adjust based on dataset availability and conditions, optimizing resource allocation and execution timing. Images: ![Captura de tela 2024-12-16 181059](https://github.com/user-attachments/assets/e591538f-3f39-44a4-9503-dac45b972e64) ![Captura de tela 2024-12-16 181103](https://github.com/user-attachments/assets/11a2cdca-5cae-4075-bc22-5b257b5d6b00) ![Captura de tela 2024-12-16 181131](https://github.com/user-attachments/assets/9b40f176-91d5-455c-9812-ee4c0ca50912) --------- Co-authored-by: ErickSeo <[email protected]>
- Loading branch information
Showing
7 changed files
with
394 additions
and
53 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
import ast | ||
|
||
|
||
class SafeEvalVisitor(ast.NodeVisitor): | ||
def __init__(self, dataset_map): | ||
self.dataset_map = dataset_map | ||
|
||
def evaluate(self, tree): | ||
return self.visit(tree) | ||
|
||
def visit_Expression(self, node): | ||
return self.visit(node.body) | ||
|
||
def visit_BinOp(self, node): | ||
left = self.visit(node.left) | ||
right = self.visit(node.right) | ||
|
||
if isinstance(node.op, ast.BitAnd): | ||
return left & right | ||
elif isinstance(node.op, ast.BitOr): | ||
return left | right | ||
else: | ||
raise ValueError(f"Unsupported binary operation: {type(node.op).__name__}") | ||
|
||
def visit_Name(self, node): | ||
if node.id in self.dataset_map: | ||
return self.dataset_map[node.id] | ||
raise NameError(f"Undefined variable: {node.id}") | ||
|
||
def visit_Constant(self, node): | ||
return node.value | ||
|
||
def generic_visit(self, node): | ||
raise ValueError(f"Unsupported syntax: {type(node).__name__}") |
Oops, something went wrong.