-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: llm led data dictionary generation for dbt models (#2515)
- Loading branch information
1 parent
52147fe
commit 5dfdd17
Showing
14 changed files
with
683 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_name": "Arbitrum Data", | ||
"description": "The Arbitrum Data dataset provides blockchain data for Arbitrum's Layer 2 scaling solution for Ethereum. It includes blocks, transactions, and traces for Arbitrum's main network, Arbitrum One. The dataset is updated weekly and allows for querying metrics such as gas usage, transaction counts, and block-level details on the Arbitrum One network.", | ||
"use_cases": [ | ||
"Analyzing gas usage per transaction on the Arbitrum One network", | ||
"Querying block-level and transaction-level data for performance insights", | ||
"Monitoring traces of transaction events and network activity on Arbitrum One" | ||
], | ||
"network": "Arbitrum One", | ||
"license": "OSO terms of service", | ||
"update_frequency": "Weekly", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"dataset_name": "Farcaster Data", | ||
"description": "The Farcaster Data dataset is a decentralized social network built on Ethereum. It mirrors the dataset offered by Indexing for use in the OSO data pipeline and includes key data such as casts (posts), links, reactions, verifications, and user profiles. This dataset allows for various social network analyses and interactions within the Farcaster ecosystem and is updated weekly.", | ||
"use_cases": [ | ||
"Identifying users with the most lifetime reactions", | ||
"Analyzing user interactions through posts (casts), reactions, and links", | ||
"Verifying Ethereum addresses associated with Farcaster users", | ||
"Profiling social behaviors and trends within the decentralized Farcaster network" | ||
], | ||
"network": "Ethereum (Farcaster)", | ||
"license": "OSO terms of service", | ||
"update_frequency": "Weekly", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"dataset_name": "Filecoin Data", | ||
"description": "The Filecoin Data dataset is a decentralized storage network that stores vital information and mirrors the dataset offered by Lily for use in the OSO data pipeline. The dataset includes details such as storage deals, miners, FVM (Filecoin Virtual Machine) transactions, and other network activities. It provides a comprehensive overview of the Filecoin ecosystem and is updated weekly.", | ||
"use_cases": [ | ||
"Querying storage deals and miner activities on the Filecoin network", | ||
"Analyzing FVM transactions and network messages", | ||
"Monitoring decentralized storage network metrics and performance", | ||
"Tracking messages sent within the Filecoin network over time" | ||
], | ||
"network": "Filecoin", | ||
"license": "OSO terms of service", | ||
"update_frequency": "Weekly", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
{ | ||
"dataset_name": "Gitcoin and Passport Data", | ||
"description": "The Gitcoin and Passport Data dataset includes all project, round, and donation data from Gitcoin's grantmaking ecosystem within Ethereum, sourced from regendata.xyz. It is updated daily and contains information on projects, donations, and identity verification scores through the Gitcoin Passport protocol. The Passport dataset allows for evaluating web3 user reputations based on verified address scores.", | ||
"use_cases": [ | ||
"Tracking donations and contributions to open-source projects on Gitcoin", | ||
"Analyzing grant round data and project mapping between OSO and Gitcoin", | ||
"Evaluating web3 user reputations through Gitcoin Passport scores", | ||
"Exploring grantmaking and funding trends within the Ethereum ecosystem" | ||
], | ||
"networks": [ | ||
"Ethereum (Gitcoin, Passport)" | ||
], | ||
"license": "OSO terms of service", | ||
"update_frequency": "Daily", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_name": "GitHub Data", | ||
"description": "The GitHub Data dataset provides an hourly updated archive of historical GitHub events. Sourced from the GH Archive project, this dataset includes information on various events like issues opened, closed, or reopened, as well as other GitHub activities. It enables detailed analysis of repository actions and user activities on GitHub.", | ||
"use_cases": [ | ||
"Tracking GitHub issues and repository activities", | ||
"Analyzing historical GitHub event data", | ||
"Monitoring project and user interactions within repositories", | ||
"Querying GitHub events such as issues, pull requests, and commits" | ||
], | ||
"license": "MIT (Code), GitHub terms of service (Data)", | ||
"update_frequency": "Hourly", | ||
"reference_documentation": "https://www.gharchive.org/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"dataset_name": "Lens Data", | ||
"description": "The Lens Data dataset provides information from Lens Protocol, an open social network built on the Polygon network. It mirrors the dataset offered by Lens for use in the OSO data pipeline. The dataset includes key data on user interactions and activities within the decentralized social network, updated weekly.", | ||
"use_cases": [ | ||
"Analyzing social interactions and posts on the Lens Protocol", | ||
"Tracking user activity and engagement within the Polygon-based social network", | ||
"Profiling decentralized social network behaviors and trends", | ||
"Monitoring the growth and dynamics of the Lens ecosystem" | ||
], | ||
"network": "Polygon (Lens Protocol)", | ||
"license": "OSO terms of service", | ||
"update_frequency": "Weekly", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"dataset_name": "Open Collective Data", | ||
"description": "The Open Collective Data dataset contains all financial transactions (expenses and deposits) made on the Open Collective platform, which enables transparent finances and governance for open-source projects. This dataset provides detailed transaction data since the platform's inception, allowing for the analysis of donations, expenses, and financial flows within open-source collectives.", | ||
"use_cases": [ | ||
"Tracking financial contributions to open-source projects on Open Collective", | ||
"Analyzing donation patterns, expenses, and governance-related financial flows", | ||
"Aggregating transaction data for specific projects such as pandas", | ||
"Ensuring financial transparency and governance insights for open-source projects" | ||
], | ||
"dataset_sections": [ | ||
"Expenses", | ||
"Deposits" | ||
], | ||
"license": "OSO terms of service", | ||
"update_frequency": "Weekly", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
{ | ||
"dataset_name": "OpenRank Data", | ||
"description": "The OpenRank Data dataset provides reputation scores calculated using verifiable compute for graph-based, iterative algorithms such as EigenTrust, Collaborative Filtering, Hubs and Authorities, and Latent Semantic Analysis. It scores Farcaster users in two ways: 'globaltrust', which calculates global reputation seeded by the trust of Optimism badgeholders, and 'localtrust', which calculates relative reputation scores of users. The dataset is updated daily and supports reputation analysis for decentralized networks.", | ||
"use_cases": [ | ||
"Calculating global reputation scores of users in decentralized networks", | ||
"Analyzing relative reputation scores between users (localtrust)", | ||
"Exploring trust and reputation propagation through iterative algorithms", | ||
"Evaluating user trust in decentralized social networks such as Farcaster" | ||
], | ||
"algorithms": [ | ||
"EigenTrust", | ||
"Collaborative Filtering", | ||
"Hubs and Authorities", | ||
"Latent Semantic Analysis" | ||
], | ||
"networks": [ | ||
"Farcaster", | ||
"Optimism (badgeholder seed)" | ||
], | ||
"license": "OSO terms of service", | ||
"update_frequency": "Daily", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
{ | ||
"dataset_name": "OSO Playground", | ||
"description": "The OSO Playground dataset is a testing and development environment that mirrors all production models. It contains a subset of projects and events, providing a space for experimentation and development without affecting production data. The dataset is updated daily and is designed to allow users to test queries and models before implementing them in production.", | ||
"use_cases": [ | ||
"Testing and development of queries and models", | ||
"Experimenting with production models in a safe environment", | ||
"Validating metrics and event aggregation without impacting production" | ||
], | ||
"license": "CC BY-SA 4.0", | ||
"update_frequency": "Daily", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
{ | ||
"dataset_name": "OSO Production Data Pipeline", | ||
"description": "The OSO Production Data Pipeline is a daily-updated pipeline that consists of queryable and downloadable stages of data. The pipeline is built using dbt-based models and is split into staging, intermediate, and mart models. The final mart models serve data from an API, providing project-level information such as OSS directory projects and code metrics. Staging and intermediate models include a universal event table containing event data (e.g., git commits, contract invocations) which are aggregated into metrics for downstream use.", | ||
"use_cases": [ | ||
"Retrieving OSS directory project lists", | ||
"Querying code metrics by project", | ||
"Aggregating event metrics such as daily contract invocations", | ||
"Analyzing data at different stages of the pipeline (staging, intermediate, mart models)" | ||
], | ||
"license": "CC BY-SA 4.0", | ||
"update_frequency": "Daily", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{ | ||
"dataset_name": "oso_projects", | ||
"description": "The oso_projects dataset contains information about various projects tracked within the OSO data pipeline. It includes fields such as project_id, project_source, project_namespace, project_name, display_name, and description. This dataset is used to catalog and organize project-level metadata for open-source projects.", | ||
"use_cases": [ | ||
"Cataloging and identifying open-source projects within the OSO pipeline", | ||
"Retrieving metadata about projects, such as name, source, and description", | ||
"Analyzing project-level data for reporting or metrics aggregation" | ||
] | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
{ | ||
"dataset_name": "Superchain Data", | ||
"description": "The Superchain Data dataset provides public blockchain data, including blocks, transactions, and traces, for several networks. This dataset is updated daily and covers networks such as Optimism, Base, Frax, Metal, Mode, PGN, and Zora. It enables querying and analysis of blockchain events, such as contract creation and transactions, and is backed by OSO's partners at Goldsky.", | ||
"use_cases": [ | ||
"Querying deployed contracts on various blockchain networks", | ||
"Analyzing blockchain transaction data and block details", | ||
"Monitoring contract creation and tracing transaction events across supported networks" | ||
], | ||
"networks": [ | ||
"Optimism mainnet", | ||
"Base", | ||
"Frax", | ||
"Metal", | ||
"Mode", | ||
"PGN", | ||
"Zora" | ||
], | ||
"license": "Apache-2.0 (Code), OSO terms of service (Data)", | ||
"update_frequency": "Daily", | ||
"reference_documentation": "https://models.opensource.observer/" | ||
} |
Oops, something went wrong.