-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dashboard-as-code functionality #201
Conversation
Resolves #127 Resolves #137 Stacked on top of #154 <img width="1713" alt="Screenshot 2024-06-12 at 17 13 17" src="https://github.com/databrickslabs/lsql/assets/5946784/f5cc3738-55d0-4e59-9183-c6f2b8a9c2b6">
Resolves #135 ``` bash ~/github/databrickslabs/lsql feat/add-cli-command ⇡1 *1 ❯ databricks labs lsql create-dashboard --folder tests/integration/dashboards/one_counter/ х INT 15:33:57 15:34:01 INFO [databricks.sdk] Using Databricks CLI authentication 15:34:01 INFO [__main__] Creating dashboard ... 15:34:01 WARN [d.l.lsql.dashboards] Parsing tests/integration/dashboards/one_counter/000_counter.md: Invalid expression / Unexpected token. Line 1, Col: 1. # Counter Below you see an example counter widget. Counter widgets in dashboards are used to display 15:34:02 INFO [__main__] Created dashboard: https://adb-REDACTED.2.azuredatabricks.net/sql/dashboardsv3/REDACTED. REDACTED ``` ![Screenshot 2024-06-14 at 15 36 27](https://github.com/databrickslabs/lsql/assets/5946784/c000c97f-891e-40ac-9e0f-cc4feb241d8f) Created issue for the WARN #159
Resolves #159 ![Screenshot 2024-06-17 at 14 28 54](https://github.com/databrickslabs/lsql/assets/5946784/be74d43a-6388-4303-a6bb-6f5b907e0018) - [x] Add unit tests - [x] Add docs
The TableV1Spec is required to support http links
No more titles visible <img width="1421" alt="Screenshot 2024-06-26 at 12 20 52" src="https://github.com/databrickslabs/lsql/assets/5946784/814bb960-b73c-4377-bc10-36729b890228">
Add required `invisibleColumns` field to `TableV1Spec.as_dict`
@@ -1690,8 +1691,6 @@ def as_dict(self) -> Json: | |||
body["encodings"] = self.encodings.as_dict() | |||
if self.frame: | |||
body["frame"] = self.frame.as_dict() | |||
if self.invisible_columns: | |||
body["invisibleColumns"] = [v.as_dict() for v in self.invisible_columns] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is not allowed to be changed, as it'll be overwritten by a codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
order=other.order or self.order, | ||
width=other.width or self.width, | ||
height=other.height or self.height, | ||
_id=other.id or self.id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_id=other.id or self.id, | |
id=other.id or self.id, |
why underscore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint: W0622: Redefining built-in 'id' (redefined-builtin)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have updated it, with a pylint disable comment. I prefer id
over _id
even though it shadows a built-in
return fallback_metadata | ||
|
||
|
||
class Tile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it a dataclass.
apparently inheritance works just fine
from dataclasses import dataclass
@dataclass
class A:
a: int
b: str
@dataclass
class B(A):
c: float
def test_dataclass_inheritance():
b = B(1, "2", 3.0)
assert b.a == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #208
default_width, default_height = self._default_size() | ||
width = self._tile_metadata.width or default_width | ||
height = self._tile_metadata.height or default_height | ||
self.position = Position(0, 0, width, height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_width, default_height = self._default_size() | |
width = self._tile_metadata.width or default_width | |
height = self._tile_metadata.height or default_height | |
self.position = Position(0, 0, width, height) | |
def _default_position(self): | |
default_width, default_height = self.size() | |
width = self._tile_metadata.width or default_width | |
height = self._tile_metadata.height or default_height | |
return Position(0, 0, width, height) |
TableTile could increase width/height based on the number of columns projected, so we have to pre-empt the design to acomodate for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #208
field = Field(name=named_select, expression=f"`{named_select}`") | ||
fields.append(field) | ||
return fields | ||
def _get_datasets(tiles: list[Tile]) -> list[Dataset]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method belongs to DashboardMetadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a bit more involved then just moving the method:
The DashboardMetadata
keeps track of TilesMetadata
in a dict[str, TilesMetadata]
with the TileMetadata.id
as key. Thus deduplicating TilesMetadata
in case of id
collisions, which is not the case when keeping the TilesMetadata
in a list. See unit test test_dashboards_creates_dashboard_with_id_collisions
and this comment.
We need to decide how to handle id
collisions to move this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #208
|
||
class TableTile(QueryTile): | ||
def _default_size(self) -> tuple[int, int]: | ||
return 6, 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to make widget wider if there are lots of projected columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #208
This PR breaks backwards compatibility for databrickslabs/ucx downstream. See build logs for more details. Running from downstreams #256 |
✅ 35/35 passed, 2 skipped, 9m4s total Running from acceptance #277 |
Co-authored-by: Serge Smertin <[email protected]>
71cac5d
to
03eeedb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* [Implicit detection](#implicit-detection) | ||
* [Widget types](#widget-types) | ||
* [Widget ordering](#widget-ordering) | ||
* [Tile ordering](#tile-ordering) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending more changes in the other PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #208
* Added Command Execution backend which uses Command Execution API on a cluster ([#95](#95)). In this release, the databricks labs lSQL library has been updated with a new Command Execution backend that utilizes the Command Execution API. A new `CommandExecutionBackend` class has been implemented, which initializes a `CommandExecutor` instance taking a cluster ID, workspace client, and language as parameters. The `execute` method runs SQL commands on the specified cluster, and the `fetch` method returns the query result as an iterator of Row objects. The existing `StatementExecutionBackend` class has been updated to inherit from a new abstract base class called `ExecutionBackend`, which includes a `save_table` method for saving data to tables and is meant to be a common base class for both Statement and Command Execution backends. The `StatementExecutionBackend` class has also been updated to use the new `ExecutionBackend` abstract class and its constructor now accepts a `max_records_per_batch` parameter. The `execute` and `fetch` methods have been updated to use the new `_only_n_bytes` method for logging truncated SQL statements. Additionally, the `CommandExecutionBackend` class has several methods, `execute`, `fetch`, and `save_table` to execute commands on a cluster and save the results to tables in the databricks workspace. This new backend is intended to be used for executing commands on a cluster and saving the results in a databricks workspace. * Added basic integration with Lakeview Dashboards ([#66](#66)). In this release, we've added basic integration with Lakeview Dashboards to the project, enhancing its capabilities. This includes updating the `databricks-labs-blueprint` dependency to version 0.4.2 with the `[yaml]` extra, allowing for additional functionality related to handling YAML files. A new file, `dashboards.py`, has been introduced, providing a class for interacting with Databricks dashboards, along with methods for retrieving and saving dashboard configurations. Additionally, a new `__init__.py` file under the `src/databricks/labs/lsql/lakeview` directory imports all classes and functions from the `model.py` module, providing a foundation for further development and customization. The release also introduces a new file, `model.py`, containing code generated from OpenAPI specs by the Databricks SDK Generator, and a template file, `model.py.tmpl`, used for handling JSON data during integration with Lakeview Dashboards. A new file, `polymorphism.py`, provides utilities for checking if a value can be assigned to a specific type, supporting correct data typing and formatting with Lakeview Dashboards. Furthermore, a `.gitignore` file has been added to the `tests/integration` directory as part of the initial steps in adding integration testing to ensure compatibility with the Lakeview Dashboards platform. Lastly, the `test_dashboards.py` file in the `tests/integration` directory contains a function, `test_load_dashboard(ws)`, which uses the `Dashboards` class to save a dashboard from a source to a destination path, facilitating testing during the integration process. * Added dashboard-as-code functionality ([#201](#201)). This commit introduces dashboard-as-code functionality for the UCX project, enabling the creation and management of dashboards using code. The feature resolves multiple issues and includes a new `create-dashboard` command for creating unpublished dashboards. The functionality is available in the `lsql` lab and allows for specifying the order and width of widgets, overriding default widget identifiers, and supporting various SQL and markdown header arguments. The `dashboard.yml` file is used to define top-level metadata for the dashboard. This commit also includes extensive documentation and examples for using the dashboard as a library and configuring different options. * Automate opening integration test dashboard in debug mode ([#167](#167)). A new feature has been added to automatically open the integration test dashboard in debug mode, making it easier for software engineers to debug and troubleshoot. This has been achieved by importing the `webbrowser` and `is_in_debug` modules from "databricks.labs.blueprint.entrypoint", and adding a check in the `create` function to determine if the code is running in debug mode. If it is, a dashboard URL is constructed from the workspace configuration and dashboard ID, and then opened in a web browser using "webbrowser.open". This allows for a more streamlined debugging process for the integration test dashboard. No other parts of the code have been affected by this change. * Automatically tile widgets ([#109](#109)). In this release, we've introduced an automatic widget tiling feature for the dashboard creation process in our open-source library. The `Dashboards` class now includes a new class variable, `_maximum_dashboard_width`, set to 6, representing the maximum width allowed for each row of widgets in the dashboard. The `create_dashboard` method has been updated to accept a new `self` parameter, turning it into an instance method. A new `_get_position` method has been introduced to calculate and return the next available position for placing a widget, and a `_get_width_and_height` method has been added to return the width and height for a widget specification, initially handling `CounterSpec` instances. Additionally, we've added new unit tests to improve testing coverage, ensuring that widgets are created, positioned, and sized correctly. These tests also cover the correct positioning of widgets based on their order and available space, as well as the expected width and height for each widget. * Bump actions/checkout from 4.1.3 to 4.1.6 ([#102](#102)). In the latest release, the 'actions/checkout' GitHub Action has been updated from version 4.1.3 to 4.1.6, which includes checking the platform to set the archive extension appropriately. This release also bumps the version of github/codeql-action from 2 to 3, actions/setup-node from 1 to 4, and actions/upload-artifact from 2 to 4. Additionally, the minor-actions-dependencies group was updated with two new versions. Disabling extensions.worktreeConfig when disabling sparse-checkout was introduced in version 4.1.4. The release notes and changelog for this update can be found in the provided link. This commit was made by dependabot[bot] with contributions from cory-miller and jww3. * Bump actions/checkout from 4.1.6 to 4.1.7 ([#151](#151)). In the latest release, the 'actions/checkout' GitHub action has been updated from version 4.1.6 to 4.1.7 in the project's push workflow, which checks out the repository at the start of the workflow. This change brings potential bug fixes, performance improvements, or new features compared to the previous version. The update only affects the version number in the YAML configuration for the 'actions/checkout' step in the release.yml file, with no new methods or alterations to existing functionality. This update aims to ensure a smooth and enhanced user experience for those utilizing the project's push workflows by taking advantage of the possible improvements or bug fixes in the new version of 'actions/checkout'. * Create a dashboard with a counter from a single query ([#107](#107)). In this release, we have introduced several enhancements to our dashboard-as-code approach, including the creation of a `Dashboards` class that provides methods for getting, saving, and deploying dashboards. A new method, `create_dashboard`, has been added to create a dashboard with a single page containing a counter widget. The counter widget is associated with a query that counts the number of rows in a specified dataset. The `deploy_dashboard` method has also been added to deploy the dashboard to the workspace. Additionally, we have implemented a new feature for creating dashboards with a counter from a single query, including modifications to the `test_dashboards.py` file and the addition of four new tests. These changes improve the robustness of the dashboard creation process and provide a more automated way to view important metrics. * Create text widget from markdown file ([#142](#142)). A new feature has been implemented in the library that allows for the creation of a text widget from a markdown file, enhancing customization and readability for users. This development resolves issue [#1](#1) * Design document for dashboards-as-code ([#105](#105)). "The latest release introduces 'Dashboards as Code,' a method for defining and managing dashboards through configuration files, enabling version control and controlled changes. The building blocks include `.sql`, `.md`, and `dashboard.yml` files, with `.sql` defining queries and determining tile order, and `dashboard.yml` specifying top-level metadata and tile overrides. Metadata can be inferred or explicitly defined in the query or files. The tile order can be determined by SQL file order, `tiles` order in `dashboard.yml`, or SQL file metadata. This project can also be used as a library for embedding dashboard generation in your code. Configuration precedence follows command-line flags, SQL file headers, `dashboard.yml`, and SQL query content. The command-line interface is utilized for dashboard generation from configuration files." * Ensure propagation of `lsql` version into `User-Agent` header when it is used as library ([#206](#206)). In this release, the `pyproject.toml` file has been updated to ensure that the correct version of the `lsql` library is propagated into the `User-Agent` header when used as a library, improving attribution. The `databricks-sdk` version has been updated from `0.22.0` to `0.29.0`, and the `__init__.py` file of the `lsql` library has been modified to add the `with_user_agent_extra` function from the `databricks.sdk.core` package for correct attribution. The `backends.py` file has also been updated with improved type handling in the `_row_to_sql` and `save_table` functions for accurate SQL insertion and handling of user-defined classes. Additionally, a test has been added to ensure that the `lsql` version is correctly propagated in the `User-Agent` header when used as a library. These changes offer improved functionality and accurate type handling, making it easier for developers to identify the library version when used in other projects. * Fixed counter encodings ([#143](#143)). In this release, we have improved the encoding of counters in the lsql dashboard by modifying the `create_dashboard` function in the `dashboards.py` file. Previously, the counter field encoding was hardcoded as "count," but has been changed to dynamically determine the first field name of the given fields, ensuring that counters are expected to have only one field. Additionally, a new integration test has been added to the `tests/integration/test_dashboards.py` file to ensure that the dashboard deployment functionality correctly handles SQL queries that do not perform a count. A new test for the `Dashboards` class has also been added to check that counter field encoding names are created as expected. The `WorkspaceClient` is mocked and not called in this test. These changes enhance the accuracy of counter encoding and improve the overall functionality and reliability of the lsql dashboard. * Fixed non-existing reference and typo in the documentation ([#104](#104)). In this release, we've made improvements to the documentation of our open-source library, specifically addressing issue [#104](#104). The changes include fixing a non-existent reference and a typo in the `Library size comparison` section of the "comparison.md" document. This section provides guidance for selecting a library based on factors like library size, unified authentication, and compatibility with various Databricks warehouses and SQL Python APIs. The updates clarify the required dependency size for simple applications and scripts, and offer more detailed information about each library option. We've also added a new subsection titled `Detailed comparison` to provide a more comprehensive overview of each library's features. These changes are intended to help software engineers better understand which library is best suited for their specific needs, particularly for applications that require data transfer of large amounts of data serialized in Apache Arrow format and low result fetching latency, where we recommend using the Databricks SQL Connector for Python for efficient data transfer and low latency. * Fixed parsing message ([#146](#146)). In this release, the warning message logged during the creation of a dashboard when a ParseError occurs has been updated to provide clearer and more detailed information about the parsing error. The new error message now includes the specific query being parsed and the exact parsing error, enabling developers to quickly identify the cause of parsing issues. This change ensures that engineers can efficiently diagnose and address parsing errors, improving the overall development and debugging experience with a more informative log format: "Parsing {query}: {error}". * Improve dashboard as code ([#108](#108)). The `Dashboards` class in the 'dashboards.py' file has been updated to improve functionality and usability, with changes such as the addition of a type variable `T` for type checking and more descriptive names for methods. The `save_to_folder` method now accepts a `Dashboard` object and returns a `Dashboard` object, and a new static method `create_dashboard` has been added. Additionally, two new methods `_with_better_names` and `_replace_names` have been added for improved readability. The `get_dashboard` method now returns a `Dashboard` object instead of a dictionary. The `save_to_folder` method now also formats SQL code before saving it to file. These changes aim to enhance the functionality and readability of the codebase and provide more user-friendly methods for interacting with the `Dashboards` class. In addition to the changes in the `Dashboards` class, there have been updates in the organization of the project structure. The 'queries/counter.sql' file has been moved to 'dashboards/one_counter/counter.sql' in the 'tests/integration' directory. This modification enhances the organization of the project. Furthermore, several tests for the `Dashboards` class have been introduced in the 'databricks.labs.lsql.dashboards' module, demonstrating various functionalities of the class and ensuring that it functions as intended. The tests cover saving SQL and YML files to a specified folder, creating a dataset and a counter widget for each query, deploying dashboards with a given display name or dashboard ID, and testing the behavior of the `save_to_folder` and `deploy_dashboard` methods. Lastly, the commit removes the `test_load_dashboard` function and updates the `test_dashboard_creates_one_dataset_per_query` and `test_dashboard_creates_one_counter_widget_per_query` functions to use the updated `Dashboard` class. A new `replace_recursively` function is introduced to replace specific fields in a dataclass recursively. A new test function `test_dashboards_deploys_exported_dashboard_definition` has been added, which reads a dashboard definition from a JSON file, deploys it, and checks if it's successfully deployed using the `Dashboards` class. A new test function `test_dashboard_deploys_dashboard_the_same_as_created_dashboard` has also been added, which compares the original and deployed dashboards to ensure they are identical. Overall, these changes aim to improve the functionality and readability of the codebase and provide more user-friendly methods for interacting with the `Dashboards` class, as well as enhance the organization of the project structure and add new tests for the `Dashboards` class to ensure it functions as intended. * Infer fields from a query ([#111](#111)). The `Dashboards` class in the `dashboards.py` file has been updated with the addition of a new method, `_get_fields`, which accepts a SQL query as input and returns a list of `Field` objects using the `sqlglot` library to parse the query and extract the necessary information. The `create_dashboard` method has been modified to call this new function when creating `Query` objects for each dataset. If a `ParseError` occurs, a warning is logged and iteration continues. This allows for the automatic population of fields when creating a new dashboard, eliminating the need for manual specification. Additionally, new tests have been added for invalid queries and for checking if the fields in a query have the expected names. These tests include `test_dashboards_skips_invalid_query` and `test_dashboards_gets_fields_with_expected_names`, which utilize the caplog fixture and create temporary query files to verify functionality. Existing functionality related to creating dashboards remains unchanged. * Make constant all caps ([#140](#140)). In this release, the project's 'dashboards.py' file has been updated to improve code readability and maintainability. A constant variable `_maximum_dashboard_width` has been changed to all caps, becoming '_MAXIMUM_DASHBOARD_WIDTH'. This modification affects the `Dashboards` class and its methods, particularly `_get_fields` and '_get_position'. The `_get_position` method has been revised to use the new all caps constant variable. This change ensures better visibility of constants within the code, addressing issue [#140](#140). It's important to note that this modification only impacts the 'dashboards.py' file and does not affect any other functionalities. * Read display name from `dashboard.yml` ([#144](#144)). In this release, we have introduced a new `DashboardMetadata` dataclass that reads the display name of a dashboard from a `dashboard.yml` file located in the dashboard's directory. If the `dashboard.yml` file is absent, the folder name will be used as the display name. This change improves the readability and maintainability of the dashboard configuration by explicitly defining the display name and reducing the need to specify widget information in multiple places. We have also added a new fixture called `make_dashboard` for creating and cleaning up lakeview dashboards in the test suite. The fixture handles creation and deletion of the dashboard and provides an option to set a custom display name. Additionally, we have added and modified several unit tests to ensure the proper handling of the `DashboardMetadata` class and the dashboard creation process, including tests for missing, present, or incorrect `display_name` keys in the YAML file. The `dashboards.deploy_dashboard()` function has been updated to handle cases where only `dashboard_id` is provided. * Set widget id in query header ([#154](#154)). In this release, we've made significant improvements to widget metadata handling in our open-source library. We've introduced a new `WidgetMetadata` class that replaces the previous `WidgetMetadata` dataclass, now featuring a `path` attribute, `spec_type` property, and optional parameters for `order`, `width`, `height`, and `_id`. The `_get_widgets` method has been updated to accept an Iterable of `WidgetMetadata` objects, and both `_get_layouts` and `_get_widgets` methods now sort widgets using the order field. A new class method, `WidgetMetadata.from_path`, handles parsing widget metadata from a file path, replacing the removed `_get_width_and_height` method. Additionally, the `WidgetMetadata` class is now used in the `deploy_dashboard` method, and the test suite for the `dashboards` module has been enhanced with updated `test_widget_metadata_replaces_width_and_height` and `test_widget_metadata_replaces_attribute` functions, as well as new tests for specific scenarios. Issue [#154](#154) has been addressed by setting the widget id in the query header, and the aforementioned changes improve flexibility and ease of use for dashboard development. * Use order key in query header if defined ([#149](#149)). In this release, we've introduced a new feature to use an order key in the query header if defined, enhancing the flexibility and control over the dashboard creation process. The `WidgetMetadata` dataclass now includes an optional `order` parameter of type `int`, and the `_get_arguments_parser()` method accepts the `--order` flag with type `int`. The `replace_from_arguments()` method has been updated to support the new `order` parameter, with a default value of `self.order`. The `create_dashboard()` method now implements a new `_get_datasets()` method to retrieve datasets from the dashboard folder and introduces a `_get_widgets()` method, which accepts a list of files, iterates over them, and yields tuples containing widgets and their corresponding metadata, including the order. These improvements enable the use of an order key in query headers, ensuring the correct order of widgets in the dashboard creation process. Additionally, a new test case has been added to verify the correct behavior of the dashboard deployment with a specified order key in the query header. This feature resolves issue [#148](#148). * Use widget width and height defined in query header ([#147](#147)). In this release, the handling of metadata in SQL files has been updated to utilize the header of the file, instead of the first line, for improved readability and flexibility. This change includes a new WidgetMetadata class for defining the width and height of a widget in a dashboard, as well as new methods for parsing the widget metadata from a provided path. The release also includes updates to the documentation to cover the supported widget arguments `-w or --width` and '-h or --height', and resolves issue [#114](#114) by adding a test for deploying a dashboard with a big widget using a new function `test_dashboard_deploys_dashboard_with_big_widget`. Additionally, new test cases have been added for creating dashboards with custom-sized widgets based on query header width and height values, improving functionality and error handling. Dependency updates: * Bump actions/checkout from 4.1.3 to 4.1.6 ([#102](#102)). * Bump actions/checkout from 4.1.6 to 4.1.7 ([#151](#151)).
* Added Command Execution backend which uses Command Execution API on a cluster ([#95](#95)). In this release, the databricks labs lSQL library has been updated with a new Command Execution backend that utilizes the Command Execution API. A new `CommandExecutionBackend` class has been implemented, which initializes a `CommandExecutor` instance taking a cluster ID, workspace client, and language as parameters. The `execute` method runs SQL commands on the specified cluster, and the `fetch` method returns the query result as an iterator of Row objects. The existing `StatementExecutionBackend` class has been updated to inherit from a new abstract base class called `ExecutionBackend`, which includes a `save_table` method for saving data to tables and is meant to be a common base class for both Statement and Command Execution backends. The `StatementExecutionBackend` class has also been updated to use the new `ExecutionBackend` abstract class and its constructor now accepts a `max_records_per_batch` parameter. The `execute` and `fetch` methods have been updated to use the new `_only_n_bytes` method for logging truncated SQL statements. Additionally, the `CommandExecutionBackend` class has several methods, `execute`, `fetch`, and `save_table` to execute commands on a cluster and save the results to tables in the databricks workspace. This new backend is intended to be used for executing commands on a cluster and saving the results in a databricks workspace. * Added basic integration with Lakeview Dashboards ([#66](#66)). In this release, we've added basic integration with Lakeview Dashboards to the project, enhancing its capabilities. This includes updating the `databricks-labs-blueprint` dependency to version 0.4.2 with the `[yaml]` extra, allowing for additional functionality related to handling YAML files. A new file, `dashboards.py`, has been introduced, providing a class for interacting with Databricks dashboards, along with methods for retrieving and saving dashboard configurations. Additionally, a new `__init__.py` file under the `src/databricks/labs/lsql/lakeview` directory imports all classes and functions from the `model.py` module, providing a foundation for further development and customization. The release also introduces a new file, `model.py`, containing code generated from OpenAPI specs by the Databricks SDK Generator, and a template file, `model.py.tmpl`, used for handling JSON data during integration with Lakeview Dashboards. A new file, `polymorphism.py`, provides utilities for checking if a value can be assigned to a specific type, supporting correct data typing and formatting with Lakeview Dashboards. Furthermore, a `.gitignore` file has been added to the `tests/integration` directory as part of the initial steps in adding integration testing to ensure compatibility with the Lakeview Dashboards platform. Lastly, the `test_dashboards.py` file in the `tests/integration` directory contains a function, `test_load_dashboard(ws)`, which uses the `Dashboards` class to save a dashboard from a source to a destination path, facilitating testing during the integration process. * Added dashboard-as-code functionality ([#201](#201)). This commit introduces dashboard-as-code functionality for the UCX project, enabling the creation and management of dashboards using code. The feature resolves multiple issues and includes a new `create-dashboard` command for creating unpublished dashboards. The functionality is available in the `lsql` lab and allows for specifying the order and width of widgets, overriding default widget identifiers, and supporting various SQL and markdown header arguments. The `dashboard.yml` file is used to define top-level metadata for the dashboard. This commit also includes extensive documentation and examples for using the dashboard as a library and configuring different options. * Automate opening integration test dashboard in debug mode ([#167](#167)). A new feature has been added to automatically open the integration test dashboard in debug mode, making it easier for software engineers to debug and troubleshoot. This has been achieved by importing the `webbrowser` and `is_in_debug` modules from "databricks.labs.blueprint.entrypoint", and adding a check in the `create` function to determine if the code is running in debug mode. If it is, a dashboard URL is constructed from the workspace configuration and dashboard ID, and then opened in a web browser using "webbrowser.open". This allows for a more streamlined debugging process for the integration test dashboard. No other parts of the code have been affected by this change. * Automatically tile widgets ([#109](#109)). In this release, we've introduced an automatic widget tiling feature for the dashboard creation process in our open-source library. The `Dashboards` class now includes a new class variable, `_maximum_dashboard_width`, set to 6, representing the maximum width allowed for each row of widgets in the dashboard. The `create_dashboard` method has been updated to accept a new `self` parameter, turning it into an instance method. A new `_get_position` method has been introduced to calculate and return the next available position for placing a widget, and a `_get_width_and_height` method has been added to return the width and height for a widget specification, initially handling `CounterSpec` instances. Additionally, we've added new unit tests to improve testing coverage, ensuring that widgets are created, positioned, and sized correctly. These tests also cover the correct positioning of widgets based on their order and available space, as well as the expected width and height for each widget. * Bump actions/checkout from 4.1.3 to 4.1.6 ([#102](#102)). In the latest release, the 'actions/checkout' GitHub Action has been updated from version 4.1.3 to 4.1.6, which includes checking the platform to set the archive extension appropriately. This release also bumps the version of github/codeql-action from 2 to 3, actions/setup-node from 1 to 4, and actions/upload-artifact from 2 to 4. Additionally, the minor-actions-dependencies group was updated with two new versions. Disabling extensions.worktreeConfig when disabling sparse-checkout was introduced in version 4.1.4. The release notes and changelog for this update can be found in the provided link. This commit was made by dependabot[bot] with contributions from cory-miller and jww3. * Bump actions/checkout from 4.1.6 to 4.1.7 ([#151](#151)). In the latest release, the 'actions/checkout' GitHub action has been updated from version 4.1.6 to 4.1.7 in the project's push workflow, which checks out the repository at the start of the workflow. This change brings potential bug fixes, performance improvements, or new features compared to the previous version. The update only affects the version number in the YAML configuration for the 'actions/checkout' step in the release.yml file, with no new methods or alterations to existing functionality. This update aims to ensure a smooth and enhanced user experience for those utilizing the project's push workflows by taking advantage of the possible improvements or bug fixes in the new version of 'actions/checkout'. * Create a dashboard with a counter from a single query ([#107](#107)). In this release, we have introduced several enhancements to our dashboard-as-code approach, including the creation of a `Dashboards` class that provides methods for getting, saving, and deploying dashboards. A new method, `create_dashboard`, has been added to create a dashboard with a single page containing a counter widget. The counter widget is associated with a query that counts the number of rows in a specified dataset. The `deploy_dashboard` method has also been added to deploy the dashboard to the workspace. Additionally, we have implemented a new feature for creating dashboards with a counter from a single query, including modifications to the `test_dashboards.py` file and the addition of four new tests. These changes improve the robustness of the dashboard creation process and provide a more automated way to view important metrics. * Create text widget from markdown file ([#142](#142)). A new feature has been implemented in the library that allows for the creation of a text widget from a markdown file, enhancing customization and readability for users. This development resolves issue [#1](#1) * Design document for dashboards-as-code ([#105](#105)). "The latest release introduces 'Dashboards as Code,' a method for defining and managing dashboards through configuration files, enabling version control and controlled changes. The building blocks include `.sql`, `.md`, and `dashboard.yml` files, with `.sql` defining queries and determining tile order, and `dashboard.yml` specifying top-level metadata and tile overrides. Metadata can be inferred or explicitly defined in the query or files. The tile order can be determined by SQL file order, `tiles` order in `dashboard.yml`, or SQL file metadata. This project can also be used as a library for embedding dashboard generation in your code. Configuration precedence follows command-line flags, SQL file headers, `dashboard.yml`, and SQL query content. The command-line interface is utilized for dashboard generation from configuration files." * Ensure propagation of `lsql` version into `User-Agent` header when it is used as library ([#206](#206)). In this release, the `pyproject.toml` file has been updated to ensure that the correct version of the `lsql` library is propagated into the `User-Agent` header when used as a library, improving attribution. The `databricks-sdk` version has been updated from `0.22.0` to `0.29.0`, and the `__init__.py` file of the `lsql` library has been modified to add the `with_user_agent_extra` function from the `databricks.sdk.core` package for correct attribution. The `backends.py` file has also been updated with improved type handling in the `_row_to_sql` and `save_table` functions for accurate SQL insertion and handling of user-defined classes. Additionally, a test has been added to ensure that the `lsql` version is correctly propagated in the `User-Agent` header when used as a library. These changes offer improved functionality and accurate type handling, making it easier for developers to identify the library version when used in other projects. * Fixed counter encodings ([#143](#143)). In this release, we have improved the encoding of counters in the lsql dashboard by modifying the `create_dashboard` function in the `dashboards.py` file. Previously, the counter field encoding was hardcoded as "count," but has been changed to dynamically determine the first field name of the given fields, ensuring that counters are expected to have only one field. Additionally, a new integration test has been added to the `tests/integration/test_dashboards.py` file to ensure that the dashboard deployment functionality correctly handles SQL queries that do not perform a count. A new test for the `Dashboards` class has also been added to check that counter field encoding names are created as expected. The `WorkspaceClient` is mocked and not called in this test. These changes enhance the accuracy of counter encoding and improve the overall functionality and reliability of the lsql dashboard. * Fixed non-existing reference and typo in the documentation ([#104](#104)). In this release, we've made improvements to the documentation of our open-source library, specifically addressing issue [#104](#104). The changes include fixing a non-existent reference and a typo in the `Library size comparison` section of the "comparison.md" document. This section provides guidance for selecting a library based on factors like library size, unified authentication, and compatibility with various Databricks warehouses and SQL Python APIs. The updates clarify the required dependency size for simple applications and scripts, and offer more detailed information about each library option. We've also added a new subsection titled `Detailed comparison` to provide a more comprehensive overview of each library's features. These changes are intended to help software engineers better understand which library is best suited for their specific needs, particularly for applications that require data transfer of large amounts of data serialized in Apache Arrow format and low result fetching latency, where we recommend using the Databricks SQL Connector for Python for efficient data transfer and low latency. * Fixed parsing message ([#146](#146)). In this release, the warning message logged during the creation of a dashboard when a ParseError occurs has been updated to provide clearer and more detailed information about the parsing error. The new error message now includes the specific query being parsed and the exact parsing error, enabling developers to quickly identify the cause of parsing issues. This change ensures that engineers can efficiently diagnose and address parsing errors, improving the overall development and debugging experience with a more informative log format: "Parsing {query}: {error}". * Improve dashboard as code ([#108](#108)). The `Dashboards` class in the 'dashboards.py' file has been updated to improve functionality and usability, with changes such as the addition of a type variable `T` for type checking and more descriptive names for methods. The `save_to_folder` method now accepts a `Dashboard` object and returns a `Dashboard` object, and a new static method `create_dashboard` has been added. Additionally, two new methods `_with_better_names` and `_replace_names` have been added for improved readability. The `get_dashboard` method now returns a `Dashboard` object instead of a dictionary. The `save_to_folder` method now also formats SQL code before saving it to file. These changes aim to enhance the functionality and readability of the codebase and provide more user-friendly methods for interacting with the `Dashboards` class. In addition to the changes in the `Dashboards` class, there have been updates in the organization of the project structure. The 'queries/counter.sql' file has been moved to 'dashboards/one_counter/counter.sql' in the 'tests/integration' directory. This modification enhances the organization of the project. Furthermore, several tests for the `Dashboards` class have been introduced in the 'databricks.labs.lsql.dashboards' module, demonstrating various functionalities of the class and ensuring that it functions as intended. The tests cover saving SQL and YML files to a specified folder, creating a dataset and a counter widget for each query, deploying dashboards with a given display name or dashboard ID, and testing the behavior of the `save_to_folder` and `deploy_dashboard` methods. Lastly, the commit removes the `test_load_dashboard` function and updates the `test_dashboard_creates_one_dataset_per_query` and `test_dashboard_creates_one_counter_widget_per_query` functions to use the updated `Dashboard` class. A new `replace_recursively` function is introduced to replace specific fields in a dataclass recursively. A new test function `test_dashboards_deploys_exported_dashboard_definition` has been added, which reads a dashboard definition from a JSON file, deploys it, and checks if it's successfully deployed using the `Dashboards` class. A new test function `test_dashboard_deploys_dashboard_the_same_as_created_dashboard` has also been added, which compares the original and deployed dashboards to ensure they are identical. Overall, these changes aim to improve the functionality and readability of the codebase and provide more user-friendly methods for interacting with the `Dashboards` class, as well as enhance the organization of the project structure and add new tests for the `Dashboards` class to ensure it functions as intended. * Infer fields from a query ([#111](#111)). The `Dashboards` class in the `dashboards.py` file has been updated with the addition of a new method, `_get_fields`, which accepts a SQL query as input and returns a list of `Field` objects using the `sqlglot` library to parse the query and extract the necessary information. The `create_dashboard` method has been modified to call this new function when creating `Query` objects for each dataset. If a `ParseError` occurs, a warning is logged and iteration continues. This allows for the automatic population of fields when creating a new dashboard, eliminating the need for manual specification. Additionally, new tests have been added for invalid queries and for checking if the fields in a query have the expected names. These tests include `test_dashboards_skips_invalid_query` and `test_dashboards_gets_fields_with_expected_names`, which utilize the caplog fixture and create temporary query files to verify functionality. Existing functionality related to creating dashboards remains unchanged. * Make constant all caps ([#140](#140)). In this release, the project's 'dashboards.py' file has been updated to improve code readability and maintainability. A constant variable `_maximum_dashboard_width` has been changed to all caps, becoming '_MAXIMUM_DASHBOARD_WIDTH'. This modification affects the `Dashboards` class and its methods, particularly `_get_fields` and '_get_position'. The `_get_position` method has been revised to use the new all caps constant variable. This change ensures better visibility of constants within the code, addressing issue [#140](#140). It's important to note that this modification only impacts the 'dashboards.py' file and does not affect any other functionalities. * Read display name from `dashboard.yml` ([#144](#144)). In this release, we have introduced a new `DashboardMetadata` dataclass that reads the display name of a dashboard from a `dashboard.yml` file located in the dashboard's directory. If the `dashboard.yml` file is absent, the folder name will be used as the display name. This change improves the readability and maintainability of the dashboard configuration by explicitly defining the display name and reducing the need to specify widget information in multiple places. We have also added a new fixture called `make_dashboard` for creating and cleaning up lakeview dashboards in the test suite. The fixture handles creation and deletion of the dashboard and provides an option to set a custom display name. Additionally, we have added and modified several unit tests to ensure the proper handling of the `DashboardMetadata` class and the dashboard creation process, including tests for missing, present, or incorrect `display_name` keys in the YAML file. The `dashboards.deploy_dashboard()` function has been updated to handle cases where only `dashboard_id` is provided. * Set widget id in query header ([#154](#154)). In this release, we've made significant improvements to widget metadata handling in our open-source library. We've introduced a new `WidgetMetadata` class that replaces the previous `WidgetMetadata` dataclass, now featuring a `path` attribute, `spec_type` property, and optional parameters for `order`, `width`, `height`, and `_id`. The `_get_widgets` method has been updated to accept an Iterable of `WidgetMetadata` objects, and both `_get_layouts` and `_get_widgets` methods now sort widgets using the order field. A new class method, `WidgetMetadata.from_path`, handles parsing widget metadata from a file path, replacing the removed `_get_width_and_height` method. Additionally, the `WidgetMetadata` class is now used in the `deploy_dashboard` method, and the test suite for the `dashboards` module has been enhanced with updated `test_widget_metadata_replaces_width_and_height` and `test_widget_metadata_replaces_attribute` functions, as well as new tests for specific scenarios. Issue [#154](#154) has been addressed by setting the widget id in the query header, and the aforementioned changes improve flexibility and ease of use for dashboard development. * Use order key in query header if defined ([#149](#149)). In this release, we've introduced a new feature to use an order key in the query header if defined, enhancing the flexibility and control over the dashboard creation process. The `WidgetMetadata` dataclass now includes an optional `order` parameter of type `int`, and the `_get_arguments_parser()` method accepts the `--order` flag with type `int`. The `replace_from_arguments()` method has been updated to support the new `order` parameter, with a default value of `self.order`. The `create_dashboard()` method now implements a new `_get_datasets()` method to retrieve datasets from the dashboard folder and introduces a `_get_widgets()` method, which accepts a list of files, iterates over them, and yields tuples containing widgets and their corresponding metadata, including the order. These improvements enable the use of an order key in query headers, ensuring the correct order of widgets in the dashboard creation process. Additionally, a new test case has been added to verify the correct behavior of the dashboard deployment with a specified order key in the query header. This feature resolves issue [#148](#148). * Use widget width and height defined in query header ([#147](#147)). In this release, the handling of metadata in SQL files has been updated to utilize the header of the file, instead of the first line, for improved readability and flexibility. This change includes a new WidgetMetadata class for defining the width and height of a widget in a dashboard, as well as new methods for parsing the widget metadata from a provided path. The release also includes updates to the documentation to cover the supported widget arguments `-w or --width` and '-h or --height', and resolves issue [#114](#114) by adding a test for deploying a dashboard with a big widget using a new function `test_dashboard_deploys_dashboard_with_big_widget`. Additionally, new test cases have been added for creating dashboards with custom-sized widgets based on query header width and height values, improving functionality and error handling. Dependency updates: * Bump actions/checkout from 4.1.3 to 4.1.6 ([#102](#102)). * Bump actions/checkout from 4.1.6 to 4.1.7 ([#151](#151)).
Covers the outstanding comments in #201
* Added method to dashboards to get dashboard url ([#211](#211)). In this release, we have added a new method `get_url` to the `lakeview_dashboards` object in the `laksedashboard` library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client. * Extend replace database in query ([#210](#210)). This commit extends the database replacement functionality in the `DashboardMetadata` class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a new `replace_database` method in the `DashboardMetadata` class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. The `create_dashboard` function has also been updated to use the new method for replacing the database and catalog. Additionally, the `TileMetadata` update method has been replaced with a new merge method, and the `QueryTile` and `Tile` classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected. * Improve object oriented dashboard-as-code implementation ([#208](#208)). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments ([#201](#201)). The `TileMetadata` dataclass now includes methods for updating and comparing tile metadata, and the `DashboardMetadata` class has been removed and its functionality incorporated into the `Dashboards` class. The `Dashboards` class now generates tiles, datasets, and layouts for dashboards using the provided `query_transformer`. The code's readability and maintainability have been further enhanced by replacing the use of the `copy` module with `dataclasses.replace` for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.
* Added method to dashboards to get dashboard url ([#211](#211)). In this release, we have added a new method `get_url` to the `lakeview_dashboards` object in the `laksedashboard` library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client. * Extend replace database in query ([#210](#210)). This commit extends the database replacement functionality in the `DashboardMetadata` class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a new `replace_database` method in the `DashboardMetadata` class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. The `create_dashboard` function has also been updated to use the new method for replacing the database and catalog. Additionally, the `TileMetadata` update method has been replaced with a new merge method, and the `QueryTile` and `Tile` classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected. * Improve object oriented dashboard-as-code implementation ([#208](#208)). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments ([#201](#201)). The `TileMetadata` dataclass now includes methods for updating and comparing tile metadata, and the `DashboardMetadata` class has been removed and its functionality incorporated into the `Dashboards` class. The `Dashboards` class now generates tiles, datasets, and layouts for dashboards using the provided `query_transformer`. The code's readability and maintainability have been further enhanced by replacing the use of the `copy` module with `dataclasses.replace` for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.
Initial version that covers functionality for ucx.
This is a big PR staged to resolve #138. It was broken in multiple smaller PR into this feature branch (see links int the commits). The feature branch allows for reviewing of the dashboard as code functionality.
dashboard.yml
#130order
key indashboard.yml
#131order
key in query header #148id
field #133TableV1Spec
from query #202*
#110--order
to0
#158--title
and--description
argument for widget #165--type
flag #187overrides
to overwrite the lower-level Databricks Lakeview entities #191from_dict
toWidgetMetdata
#157