Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding metadata from dataset schemas #1008

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

DSuveges
Copy link
Contributor

✨ Context

Column level metadata can be defined/added based on the dataset schema JSON files. This information then can be baked into the exported data, so any time the dataset is opened, the column level metadata information is available. This can also be directly used for generating data documentation via ml croissant.

Contents

  1. Variant index dataset annotated with column level metadata.
  2. Schemas package is updated to extract metadata from the expected schema and added to observed schema.
  3. Dataset.validate method optionally propagates the metadata to the dataset.

After the dataset is saved, the metadata is in the parquet files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant