Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate model properties with types #39

Open
Esya opened this issue Nov 9, 2023 · 3 comments
Open

Generate model properties with types #39

Esya opened this issue Nov 9, 2023 · 3 comments

Comments

@Esya
Copy link

Esya commented Nov 9, 2023

On top of generating the list of all columns, it would be great if dbt-invoke could also pull the data_type and write it to the properties, so that one might use model contracts

@NUSTemple
Copy link

NUSTemple commented Apr 13, 2024

I plan to take this issue.

Currently the column information is from dbt-invoke/dbt_invoke/internal/_utils.py with below MACRO _log_columns_list

MACROS = {
    '_log_columns_list': (
        "\n{# This macro is intended for use by dbt-invoke #}"
        "\n{% macro _log_columns_list(sql=none, resource_name=none) %}"
        "\n    {% if sql is none %}"
        "\n        {% set sql = 'select * from ' ~ ref(resource_name) %}"
        "\n    {% endif %}"
        "\n    {% if execute %}"
        "\n        {{ log(get_columns_in_query(sql), info=True) }}"
        "\n    {% endif %}"
        "\n{% endmacro %}\n"
    )
} 

Based on information https://discourse.getdbt.com/t/is-there-a-way-to-get-the-column-data-types-from-sql-in-v1-4-4/7834, there is a new macro get_column_schema_from_query which is able to get column data_type

Then this information is passed to dbt-invoke/dbt_invoke/properties.py function _get_property_column as input to populate the column properties. Need to add data_type when populating the column_dict.

  def _get_property_column(column_name):
      """
      Create a dictionary representing column properties
  
      :param column_name: Name of column
      :return: A dictionary representing column properties
      """
      column_dict = {'name': column_name, 'description': ""}
      return column_dict

Current proposal is:

  1. Add a flag (--add_data_type) to indicate where data_type should be included in properties file.
  2. If add_data_type is true (default is false for compatibility), use macro get_column_schema_from_query to get column info with data_type
  3. populate this info _get_property_column with data_type

@robastel
Copy link
Member

@NUSTemple thanks for your evaluation of the issue! I appreciate your willingness to take it on.

Here are some of my initial thoughts:

    1. Add a flag (--add_data_type) to indicate where data_type should be included in properties file.
    • What exactly do you mean by where? I'm asking because my understanding is that when a contract is enforced, data_type must be defined for every column in the model.
  • Do you plan to also include the enforced: true setting under the model's contract configuration? (https://docs.getdbt.com/docs/collaborate/govern/model-contracts#how-to-define-a-contract)

  • Another consideration is how to make sure dbt-invoke avoids applying contract logic to ephemeral models and other resource types that do not support contracts.

@mahazza
Copy link

mahazza commented Apr 26, 2024

This would be useful for us too.

I'd expect a flag called --add-data-type to simply add the data_type in without setting up model contracts. If the goal is to set up model contracts then something like --model-contracts = True|False

I would use it to

  • Select the consumption aka gold tables that I want to enforce
  • Enable to enforce flag manually if required

So since i select the consumption tables anyway, ephemeral limitations doesn't bother me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants