Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse and translate dbt metric definitions #38

Open
olivierdupuis opened this issue Nov 14, 2022 · 0 comments
Open

Parse and translate dbt metric definitions #38

olivierdupuis opened this issue Nov 14, 2022 · 0 comments

Comments

@olivierdupuis
Copy link
Collaborator

As dbt starts deploying their semantic layer, there is a need to not only parse entities defined in dbt, but also metrics. The biggest advantage is for Droughty users to control all semantic functions from within one tool. We currently can expose entities and relationships to Looker, Cube and dbml. Now it would be great if we could also expose the metrics as well.

There probably could be multiple ways of accomplishing this. A few options would be:

  • When dbt runs on dbt Cloud, we could read directly from their metadata api.
  • We could materialize artifacts to the data warehouse directly (using existing dbt packages) and read the metric definitions from there.
  • Parse the yaml files directly from the dbt project and select metric definitions from there.

There is however another way that might be way simpler and that is currently being used by Cube itself. Its implementation can be found in the loadMetricCubesFromDbtProject method of their dbt extension package.

They currently only document how to read dbt metrics when using dbt Cloud, but after digging into the above repo, we can see that they have a few options in place that they seem to have not documented (nor maybe tested) yet.

Regardless, the way they approach the parsing of dbt metric definitions for projects hosted outside of dbt Cloud is by reading the manifest.json file generated after a dbt run job. The bit where they do the parsing starts on line 147:

Object.keys(manifest.metrics).forEach(metric => {
      const regex = /^ref\('(\S+)'\)$/;
      const metricDef = manifest.metrics[metric];
      const match = metricDef.model.match(regex);
      if (!match) {
        throw new UserError(`Expected reference to the model in format ref('model_name') but found '${metricDef.model}'`);
      }
      // eslint-disable-next-line prefer-destructuring
      const modelName = match[1];
      metricDef.model = modelName.indexOf('.') !== -1 ? modelName : `model.${metricDef.package_name}.${modelName}`;
    });

If I read the manifest.json file of the discursus project where I have defined a metric, we have an example of what is being parsed:

"metrics": {
        "metric.discursus_dw.event_count": {
            "fqn": [
                "discursus_dw",
                "warehouse",
                "core",
                "event_count"
            ],
            "unique_id": "metric.discursus_dw.event_count",
            "package_name": "discursus_dw",
            "root_path": "/Users/olivierdupuis/Git/discursus/platform/discursus_data_platform/dw",
            "path": "warehouse/core/core_metrics.yml",
            "original_file_path": "models/warehouse/core/core_metrics.yml",
            "name": "event_count",
            "description": "The count of events",
            "label": "Events count",
            "calculation_method": "count",
            "timestamp": "event_date",
            "expression": "event_pk",
            "filters": [],
            "time_grains": [
                "day",
                "week",
                "month",
                "quarter",
                "year"
            ],
            "dimensions": [
                "action_geo_country_name"
            ],
            "window": null,
            "model": "ref('events_fct')",
            "model_unique_id": null,
            "resource_type": "metric",
            "meta": {},
            "tags": [],
            "config": {
                "enabled": true
            },
            "unrendered_config": {},
            "sources": [],
            "depends_on": {
                "macros": [],
                "nodes": [
                    "model.discursus_dw.events_fct"
                ]
            },
            "refs": [
                [
                    "events_fct"
                ]
            ],
            "metrics": [],
            "created_at": 1668096680.095598
        }
    },

Of course, that only covers the parsing of metric definitions and not how that can then be translated to either Looker, Cube or dbml definitions. But maybe a first step could be to have this as an experimental feature that would be used to convert dbt metrics to Cube, as it might be an appealing feature for dbt and Cube users who are looking for a way to integrate both tools without having to use dbt Cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant