You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Goal
Retrieve execution metadata of DAGs generated by lineapy in Airflow. We will use the metadata to support lineage queries (and potentially control the runtime behavior of Airflow since it's possible https://www.astronomer.io/guides/airflow-queries).
Note: this is not a UX design for lineage query use cases.
Desiderata
Not having to inject lineapy APIs into the generated DAGs.
Collect comprehensive metadata.
Proposed solution lineapy to retrieve execution records and status for DAGs through Airflow's DB (schemata) and either
add relevant records to lineapy's DB, which requires augmenting our DB schema, or
use Airflow DB query results in memory only to support lineage use cases in lineapy
Assuming we will always have access to the Airflow DB, Option 2 avoids adding complexity to our DB schema and is easier to adapt to changes in Airflow. However, this is a pretty major assumption that could severely hamper lineapy's usefulness if violated. Thus, I recommend Option 1.
Side note: in a previous discussion, we floated the idea of injecting lineapy APIs into the generated DAGs. The upside is that it gives us a lot of flexibility in terms of what we could log. The downside is that it requires users to install lineapy into their production Airflow environments, which violates the first desideratum.
TODO:
design the basic table schemata for tracking executions in lineapy's DB
write Airflow DB connectors to poll Airflow's DB for updates on DAGs generated by lineapy
The text was updated successfully, but these errors were encountered:
Goal
Retrieve execution metadata of DAGs generated by
lineapy
in Airflow. We will use the metadata to support lineage queries (and potentially control the runtime behavior of Airflow since it's possible https://www.astronomer.io/guides/airflow-queries).Note: this is not a UX design for lineage query use cases.
Desiderata
lineapy
APIs into the generated DAGs.Proposed solution
lineapy
to retrieve execution records and status for DAGs through Airflow's DB (schemata) and eitherlineapy
Assuming we will always have access to the Airflow DB, Option 2 avoids adding complexity to our DB schema and is easier to adapt to changes in Airflow. However, this is a pretty major assumption that could severely hamper lineapy's usefulness if violated. Thus, I recommend Option 1.
Side note: in a previous discussion, we floated the idea of injecting
lineapy
APIs into the generated DAGs. The upside is that it gives us a lot of flexibility in terms of what we could log. The downside is that it requires users to installlineapy
into their production Airflow environments, which violates the first desideratum.TODO:
lineapy
's DBlineapy
The text was updated successfully, but these errors were encountered: