Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add openlineage adapter #1123

Merged
merged 7 commits into from
Sep 10, 2024
Merged

Add openlineage adapter #1123

merged 7 commits into from
Sep 10, 2024

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Sep 5, 2024

Note: I updated the materializer docs because we missed adding @dataloader() and @datasaver() to them
under the materialization concepts page.

This adapter emits OpenLineage events.

# create the openlineage client
from openlineage.client import OpenLineageClient

# write to file
from openlineage.client.transport.file import FileConfig, FileTransport
file_config = FileConfig(
    log_file_path="/path/to/your/file",
    append=False,
)
client = OpenLineageClient(transport=FileTransport(file_config))

# write to HTTP, e.g. marquez
client = OpenLineageClient(url="http://localhost:5000")

# create the adapter
adapter = OpenLineageAdapter(client, "my_namespace", "my_job_name")

# add to Hamilton
# import your pipeline code
dr = driver.Builder().with_modules(YOUR_MODULES).with_adapters(adapter).build()
# execute as normal -- and openlineage events will be emitted
dr.execute(...)
Note for data lineage to be emitted, you must use the "materializer" abstraction to provide
metadata. See https://hamilton.dagworks.io/en/latest/concepts/materialization/.
This can be done via the `@datasaver()` and `@dataloader()` decorators, or
using the `@load_from` or `@save_to` decorators, as well as passing in data savers
and data loaders via `.with_materializers()` on the Driver Builder, or via `.materialize()`
on the driver object.

TODOs:

  • move adapter to h_openlineage.py
  • write example
  • write docs for adapter
  • write some tests

Changes

  • adds h_openlineage
  • adds example
  • updates docs

How I tested this

  • runs locally
  • shows up in marquez

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz marked this pull request as ready for review September 6, 2024 17:42
skrawcz and others added 7 commits September 9, 2024 22:06
Seems to work with marquez!
This gets the V1 version going.

Main assumptions:
 - we utilize the utils functions for metadata

This is the basis for someone quickly and easily
implementing OL.

TODO:  version the utils metadata schema
TODO: add some unit tests.
@elijahbenizzy elijahbenizzy force-pushed the add_openlineage_adapter branch from da6138c to 6b056ca Compare September 10, 2024 05:06
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Downloaded, played around with it, and fixed a small bug.

I think we want to veresionthe Hamilton Facets/think about more nuanced facet stuff, but for now this is a good bridge.

@elijahbenizzy elijahbenizzy merged commit 123c28b into main Sep 10, 2024
14 of 24 checks passed
@elijahbenizzy elijahbenizzy deleted the add_openlineage_adapter branch September 10, 2024 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants