Release v0.56.0 (#3680) · databrickslabs/ucx@05c2d6a

Commit

Release v0.56.0 (#3680)
* Added documentation to use Delta Live Tables migration
([#3587](#3587)). In this
documentation update, we introduce a new section for migrating Delta
Live Table pipelines to the Unity Catalog as part of the migration
process. This workflow allows for the original and cloned pipelines to
run independently after the cloned pipeline reaches the `RUNNING` state.
The update includes an example of stopping and renaming an existing HMS
DLT pipeline, and creating a new cloned pipeline. Additionally, known
issues and limitations are outlined, such as supported streaming
sources, maintenance pausing, and querying by timestamp. To streamline
the migration process, the `migrate-dlt-pipelines` command is introduced
with optional parameters for including or excluding specific pipeline
IDs. This feature is intended for developers and administrators managing
data pipelines and handling table aliasing issues. Relevant user
documentation has been added and the changes have been manually tested.
* Added support for MSSQL and POSTGRESQL to HMS Federation
([#3701](#3701)). In this
enhancement, the open-source library now supports Microsoft SQL Server
(MSSQL) and PostgreSQL databases in the Hive Metastore Federation (HMS
Federation) feature. This update introduces classes for handling
external Hive Metastore instances and their versions, and refactors a
regex pattern for better support of various JDBC URL formats. A new
`supported_databases_port` class variable is added to map supported
databases to default ports, allowing the code to handle SQL Server's
distinct default port. Additionally, a `supported_hms_versions` class
variable is created, outlining supported Hive Metastore versions. The
`_external_hms` method is updated to extract HMS version information
more accurately, and the `_split_jdbc_url` method is refactored for
better URL format compatibility and parameter extraction. The test file
`test_federation.py` has been updated with new unit tests for external
catalog creation with MSSQL and PostgreSQL, further enhancing
compatibility with various databases and expanding HMS Federation's
capabilities.
* Added the CLI command for migrating DLT pipelines
([#3579](#3579)). A new CLI
command, "migrate-dlt-pipelines," has been added for migrating DLT
pipelines from HMS to UC using the DLT Migration API. This command
allows users to include or exclude specific pipeline IDs during
migration using the `--include-pipeline-ids` and
`--exclude-pipeline-ids` flags, respectively. The change impacts the
`PipelinesMigrator` class, which has been updated to accept and use
these new parameters. Currently, there is no information available about
testing, but the changes are expected to be manually tested and
accompanied by corresponding unit and integration tests in the future.
The changes are isolated to the `PipelinesMigrator` class and related
functionality, with no impact on existing methods or functionality.
* Addressed Bug with Dashboard migration
([#3663](#3663)). In this
release, the `_crawl` method in `dashboards.py` has been enhanced to
exclude SDK dashboards that lack IDs during the dashboard migration
process. This modification enhances migration efficiency by avoiding
unnecessary processing of incomplete dashboards. Additionally, the
`_list_dashboards` method now includes a check for dashboards with no
IDs while iterating through the `dashboards_iterator`. If a dashboard
with no ID is found, the method fetches the dashboard details using the
`_get_dashboard` method and adds them to the `dashboards` list, ensuring
proper processing. Furthermore, a bug fix for issue
[#3663](#3663) has been
implemented in the `RedashDashboardCrawler` class in
`assessment/test_dashboards.py`. The `get` method has been added as a
side effect to the `WorkspaceClient` mock's `dashboards` attribute,
enabling the retrieval of individual dashboard objects by their IDs.
This modification ensures that the `RedashDashboardCrawler` can
correctly retrieve and process dashboard objects from the
`WorkspaceClient` mock, preventing errors due to missing dashboard
objects.
* Broaden safe read text caught exception scope
([#3705](#3705)). In this
release, the `safe_read_text` function has been enhanced to handle a
broader range of exceptions that may occur while reading a text file,
including `OSError` and `UnicodeError`, making it more robust and safe.
The function previously caught specific exceptions such as
`FileNotFoundError`, `UnicodeDecodeError`, and `PermissionError`.
Additionally, the codebase has been improved with updated unit tests,
ensuring that the new functionality works correctly. The linting parts
of the code have also been updated, enhancing the readability and
maintainability of the project for other software engineers. A new
method, `safe_read_text`, has been added to the `source_code` module,
with several new test cases designed to ensure that the method handles
edge cases correctly, such as when the file does not exist, when the
path is a directory, or when an OSError occurs. These changes make the
open-source library more reliable and robust for various use cases.
* Case sensitive/insensitive table validation
([#3580](#3580)). In this
release, the library has been updated to enable more flexible and
customizable metadata comparison for tables. A case sensitive flag has
been introduced for metadata comparison, which allows for consideration
or ignoring of column name case during validation. The
`TableMetadataRetriever` abstract base class now includes a new
parameter `column_name_transformer` in the `get_metadata` method, which
is a callable that can be used to transform column names as needed for
comparison. Additionally, a new `case_sensitive` parameter has been
added to the `StandardSchemaComparator` constructor to determine whether
column names should be compared case sensitively or not. A new
parametrized test function `test_schema_comparison_case` has also been
included to ensure that this functionality works as expected. These
changes provide users with more control over the metadata comparison
process and improve the library's handling of cases where column names
in the source and target tables may have different cases.
* Catch `AttributeError` in `InfferedValue._safe_infer_internal`
([#3684](#3684)). In this
release, we have implemented a change to the `_safe_infer_internal`
method in the `InferredValue` class to catch `AttributeError`. This
change addresses an issue in the Astroid library reported in their
GitHub repository (<pylint-dev/astroid#2683>)
and resolves issue
[#3659](#3659) in our
project. By handling `AttributeError` during the inference process, we
have made the code more robust and safer. When an exception occurs, an
error message is logged with debug-level logging, and the method yields
the `Uninferable` sentinel value to indicate that inference failed for
the node. This enhancement strengthens the source code linting code
through value inference in our open-source library.
* Document to run `validate-groups-membership` before groups migration,
not after ([#3631](#3631)).
In this release, we have updated the order of executing the
`validate-groups-membership` command in the group migration process.
Previously, the command was recommended to be run after the groups
migration, but it has been updated to be executed before the migration.
This change ensures that the groups have the correct membership and the
number of groups and users in the workspace and account are the same
before migration, providing an extra level of safety. Additionally, we
have updated the `remove-workspace-local-backup-groups` command to
remove workspace-level backup groups and their permissions only after
confirming the successful migration of all groups. We have also updated
the spelling of the `validate-group-membership` command to
`validate-groups-membership` in a documentation file. This release is
aimed at software engineers who are adopting the project and looking to
migrate their groups to the account level.
* Extend code migration progress documentation
([#3588](#3588)). In this
documentation update, we have added two new sections, `Code Migration`
and "Final details," to the open-source library's migration process
documentation. The `Code Migration` section provides a detailed
walkthrough of the steps to migrate code after completing table
migration and data reconciliation, including using the linter to
investigate compatibility issues and linted workspace resources. The
"[linter advices](/docs/reference/linter_codes)" provide codes and
messages on detected issues and resolution methods. The migrated code
can then be prioritized and tracked using the `migration-progress`
dashboard, and migrated using the `migrate-` commands. The `Final
details` section outlines the steps to take once code migration is
complete, including running the `cluster-remap` command to remap
clusters to be Unity Catalog compatible. This update resolves issue
[#2231](#2231) and includes
updated user documentation, with new methods for linting and migrating
local code, managing dashboard migrations, and syncing workspace
information. Additional commands for creating and validating table
mappings, migrating locations, and assigning metastores are also
included, with the aim of improving the code migration process by
providing more detailed documentation and new commands for managing the
migration.
* Fixed Skip/Unskip schema functionality
([#3567](#3567)). In this
release, we have addressed the improper handling of skip/unskip schema
functionality in our open-source library. The `skip_schema` and
`unskip_schema` methods in the `mapping.py` file have been updated to
include the `hive_metastore` schema prefix while setting or unsetting
the database property that determines whether a schema should be
skipped. Additionally, the `_get_database_in_scope_task` and
`_get_table_in_scope_task` methods have been modified to parse table
properties as a dictionary, allowing for more straightforward lookup of
the skip property for a table. The `test_skip_with_schema` and
`test_unskip_with_schema` methods in the `tests/unit/test_cli.py` file
have also been updated. The `test_skip_with_schema` method now includes
the catalog name `hive_metastore` in the `ALTER SCHEMA` statement,
ensuring that the schema is properly skipped. The
`test_unskip_with_schema` method has been modified to use the `SET
DBPROPERTIES` statement to set the value of the
`databricks.labs.ucx.skip` property to `false`, effectively unskipping
the schema. Furthermore, the `execute` method in the `sbe` module and
the queries in the `mock_backend` module have been updated to match the
new commands. These changes address the issue of improperly skipping
schemas and ensure that the code functions as intended, allowing users
to skip and unskip schemas as needed. Overall, these modifications
improve the reliability and correctness of the skip/unskip schema
functionality, ensuring that it behaves as expected in different
scenarios.
* Fixed `Total Tables` widget in assessment to only show table counts
([#3738](#3738)). In this
release, we have addressed the issue with the `Total Tables` widget in
the assessment dashboard as part of resolving
[#3738](#3738) and in
relation to [#3252](#3252).
The revised `00_3_count_total_tables.sql` query in the
`src/databricks/labs/ucx/queries/assessment/main/` directory now
includes a WHERE clause to filter out views from the table count query.
By excluding views and only displaying table counts in the `Total
Tables` widget, the scope of changes is limited to the SQL query itself.
The diff reflects the addition of the WHERE clause and necessary
indentation. The commit has been manually tested as part of our quality
assurance process, and the successful test results are documented in the
`Tests` section of the commit message.
* Fixed broken anchor for doc release
([#3720](#3720)). In this
release, we have developed and implemented fixes to address issues with
the Databricks workflows documentation used in the migration process.
The previous version contained a broken anchor reference for the
workflow process, which has now been corrected. This improvement
includes the addition of a manual test to verify the fix. The revised
documentation enables users to view the status of deployed workflows and
rerun failed workflows using the `workflows` and `repair-run` commands,
respectively. These updates simplify the management and troubleshooting
of workflows, enhancing the overall user experience.
* Fixed broken anchors in documentation
([#3712](#3712)). In this
release, we have made significant improvements to the UCX process
documentation, addressing issues related to broken anchors, outdated
command names, and syntax. The commands `enable_hms_federation` and
`create_federated_catalog` have been renamed to `enable-hms-federation`
and `create-federated-catalog`, respectively. These updates include
corresponding changes to the command syntax and have been manually
tested to ensure accuracy. Additionally, we have added a new command,
`validate-groups-membership`, which can be executed prior to the group
migration workflow for added confidence. In case of no matching account
group in the UCX-installed workspace, the `create-account-groups`
command is now available. This release also includes updates to the
section titles and links to enhance clarity and reflect current
functionality.
* Fixed notebook sources with `NotebookLinter.apply`
([#3693](#3693)). A new
`Github.py` file has been added to the `databricks/labs/ucx/` directory,
providing functionality for working with GitHub issues. It includes an
`IssueType` enum, a `construct_new_issue_url` function, and constants
for constructing URLs to the documentation and GitHub repository. The
`NotebookLinter` class has been updated to include notebook fixing
functionality, and the `PythonLinter` class has been introduced to run
`apply` on an Abstract Syntax Tree (AST) tree. The `Notebook.apply`
method has been implemented to apply changes to notebook sources and the
legacy `NotebookMigrator` has been removed. These changes also include
various unit and integration tests and modifications to the existing
`databricks labs ucx migrate-local-code` command. The `DOCS_URL` method
has been added to the `databricks.labs.ucx.github` module, and the error
message for external metastore connectivity issues now includes a link
to the UCX installation instruction in the documentation.
* Fixed the broken documentation links in dashboards
([#3726](#3726)). This
revision updates documentation links in various dashboards to correct
broken links and enhance the user experience. Specifically, it addresses
issues [#3725](#3725) and
[#3726](#3726) by updating
links in the "Assessment Overview," "Assessment Summary," and `Compute
summary` dashboards, as well as the `group migration` and `table
upgrade` documentation. The changes include replacing local Markdown
file links with online documentation links and updating links to point
to the correct documentation sections in the UCX GitHub repository.
Although the changes have been manually tested, no unit or integration
tests have been added, and staging environment verification has not been
performed. Despite this, the revisions ensure accurate and up-to-date
documentation links, improving the usability of the dashboards.
* Force `MaybeDependency` to have a `Dependency` OR `list[Problem]`, not
neither nor both
([#3635](#3635)). This
commit enforces the `MaybeDependency` object to have either a
`Dependency` or a `list[Problem]`, but not neither or both, in order to
handle known libraries during import registration. It resolves issue
[#3585](#3585), breaks up
issue [#3626](#3626), and
progresses issue
[#1527](#1527), while
modifying code linting logic and updating unit tests to accommodate
these changes. Specifically, new classes like `KnownLoader`,
`KnownDependency`, and `KnownProblem` have been introduced, and the
`_resolve_allow_list` method has been updated to reflect the new
enforcement. Additionally, tests have been added and modified to ensure
the correct behavior of the modified logic, with a focus on handling
directories, resolving children in context, and detecting known problems
in imported libraries.
* HMS Federation Documentation
([#3688](#3688)). The HMS
Federation feature allows Hive Metastore (HMS) to be federated to a
catalog, acting as a step towards migrating to Unity Catalog or as a
hybrid solution where both HMS and UC access to the data is required.
This feature provides an alternative to the table migration process,
eliminating the need for table mapping, creating catalogs and schemas,
and migrating Hive metastore data objects. The `enable_hms_federation`
command enables the Hive Metastore federation process, while the
`create_federated_catalog` command creates a UC catalog that mirrors all
the schemas and tables in the source Hive Metastore. The
`migrate-glue-credentials` command, which is AWS-only, creates a UC
Service Credential for GLUE. These new commands are documented in the
HMS Federation Documentation section and are now part of the migration
process documentation with the data reconciliation step following it. To
enable HMS Federation, use the `enable-hms-federation` and
`create-federated-catalog` commands.
* Make `MaybeTree` the main Python AST entrypoint for constructing the
syntax tree
([#3550](#3550)). In this
release, the main entry point for constructing the Python AST syntax
tree has been changed from `Tree` to `MaybeTree` in the open-source
library. This change involves moving class methods and static methods
that construct a `MaybeTree` from the `Tree` class to the `MaybeTree`
class, and making the class method that normalizes the source code
before parsing the only entry point. The `normalized_parse` method has
been renamed to `from_source_code` to match the commonly used naming for
class methods within UCX. The `walk` and `first_statement` methods have
been removed from `MaybeTree` as they were repetitions from `Tree`'s
methods. These changes aim to enforce normalization and improve code
consistency. Additionally, unit tests have been added and the Python
linting related code has been modified to work with the new `MaybeTree`
class. This change resolves issues
[#3457](#3457) and
[#3213](#3213).
* Make fixer diagnostic codes unique
([#3582](#3582)). This
commit modifies the `databricks labs ucx migrate-local-code` command to
make fixer diagnostic codes unique, ensuring accurate code migration and
fixing. Two new methods have been added for modifying and adding unit
and integration tests. Diagnostic codes for the `table-migrated-to-uc`
issue are now unique depending on the context where the table is
referenced: SQL, Python, or Python-SQL. This ensures the appropriate
fixer is applied when addressing code migration issues, improving
overall functionality and user experience. Additionally, the commit
updates the documentation to include the new postfixes for the
`table-migrated-to-uc` linter code and their descriptions, making it
clearer for developers to diagnose and resolve issues related to table
migration.
* Removed the linting false positive for missing table format warning
when using `spark.table`
([#3589](#3589)). In this
release, linting false positives related to missing table format
warnings when using `spark.table` have been addressed, resolving issue
[#3545](#3545). The linting
logic and unit tests have been updated to handle changes in the default
format for table references in Databricks Runtime 8.0, which now uses
Delta as the default format. These changes improve the accuracy of the
linting process, reducing unnecessary warnings and enhancing the overall
developer experience. Additionally, the
`test_linting_walker_populates_paths` unit test in the `test_jobs.py`
file has been updated to use a different file path for testing.
* Removed tree from `PythonSequentialLinter`
([#3535](#3535)). In this
release, the `PythonSequentialLinter` has been refactored to no longer
manipulate the code tree, and instead, the tree manipulation logic has
been moved to `NotebookLinter`. This change improves the separation of
concerns between the two components, resulting in a more modular and
maintainable codebase. The `NotebookLinter` now handles early failure
when resolving the code used by a notebook and attaches `%run` notebook
trees as a child tree to the cell that calls the notebook. The code
linting functionality has been modified, and the `databricks labs ucx
lint-local-code` command has been updated. These changes resolve
[#3543](#3543) and progress
[#3514](#3514) and are
dependent on PRs
[#3529](#3529) and
[#3550](#3550). The changes
have been manually tested and include added and modified unit tests.
Additionally, the `Advice` class has been updated to include a type
variable `T`, which allows for more specific type hinting when creating
instances of the class and its subclasses.
* Rename file language helper function
([#3661](#3661)). In this
code change, the helper function for determining the file language and
checking its support by the linter has been renamed and refactored. The
function, previously called `file_language`, has been updated and now
named `infer_file_language_if_supported`. This change clarifies the
function's purpose as it not only infers the file language but also
checks if the file is supported by the linter, acting as a filter. The
function returns a `Language` object if the file is supported or `None`
if it is not. The `infer_file_language_if_supported` function has been
used in other parts of the codebase, such as the `is_a_notebook`
function. This change improves the codebase's readability and
maintainability by making the helper function's purpose more explicit.
The related code has been updated to use the new function accordingly.
* Scope crawled jobs in `JobsCrawler` with `include_job_ids`
([#3658](#3658)). In this
release, the `JobsCrawler` class in the `workflow_task.py` file has been
updated to include a new optional parameter `include_job_ids` in the
constructor. This parameter allows users to specify a list of job IDs to
include in the crawling process, improving efficiency in large
workspaces. Additionally, a check has been added to the `_assess_jobs`
method to skip jobs whose IDs are not in the list of included IDs.
Integration tests have been added to ensure the correct behavior of the
new feature. This change resolves issue
[#3656](#3656), which
requested the ability to crawl jobs based on a specific list of job IDs.
It is recommended to add a comment to the code explaining the purpose
and usage of the `include_job_ids` parameter and update the
documentation accordingly.
* Support fixing `LocalFile`'s with `FileLinter`
([#3660](#3660)). In this
release, we have added new methods `write_text`, `safe_write_text`,
`back_up_path`, and `revert_back_up_path` to the `base.py` file to
support fixing files in `LocalFile` containers and adding unit tests and
integration tests. The `LocalFile` class in the "files.py" file has been
extended to include new methods and properties, such as `apply`,
`migrated_code`, `back_up_path`, and
`back_up_original_and_flush_migrated_code`, enabling fixing files using
linters and writing changes back to the container. The `databricks labs
ucx migrate-local-code` command has also been updated to utilize the new
functionality. These changes address issue
[#3514](#3514), ensuring the
proper handling of errors during file writing and providing automated
fixing of code issues within LocalFiles.
* Updated `migate-local-code` to use latest linter functionality
([#3700](#3700)). In this
update, the `migrate-local-code` command has been enhanced by
incorporating the latest linter functionality. The `LocalFileMigrator`
and `LocalCodeLinter` classes have been merged, and the interfaces of
`.fix` and `.apply` methods have been aligned. A new `FixerWalker` has
been introduced to address dependencies in the dependency graph, and the
existing `databricks labs ucx migrate-local-code` command has been
updated accordingly. Relevant unit tests and integration tests have been
added and modified to ensure the correctness of the changes, which
resolve issue [#3514](#3514)
and supersede issue
[#3520](#3520). The
`lint-local-code` command has also been updated with a flag to specify
the path for linting. The `migate-local-code` command now lints local
code and generates advice on how to make it compatible with the Unity
Catalog, and can also apply local code fixes to make them compatible.
* Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4
([#3572](#3572)). In this
pull request, we have updated the requirement for the `sqlglot` library
in the 'pyproject.toml' file, changing it from being greater than or
equal to version 25.5.0 and less than 26.3, to being greater than or
equal to version 25.5.0 and less than 26.4. This change is part of issue
[#3572](#3572) and was made
to allow for the use of the latest version of 'sqlglot'. The pull
request includes a changelog from the `sqlglot` repository, detailing
the changes made in each version between 25.5.0 and 26.4. The commits
relevant to this update include bumping the version of `sqlglotrs` to
various versions between 0.3.7 and 0.3.14. This pull request was
automatically generated by Dependabot, a tool that creates pull requests
to update the dependencies in a project. It is now ready for review and
merging.
* Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7
([#3677](#3677)). In this
release, we have updated the `sqlglot` dependency from version
`>=25.5.0,<26.4` to `>=25.5.0,<26.7`. This change allows us to leverage
the latest version of `sqlglot`, which includes various bug fixes and
improvements, such as avoiding redundant casts in FROM/TO_UTC_TIMESTAMP
and enhancing UUID support. Although there are some breaking changes
introduced in the latest version, they should not affect our project's
functionality. Additionally, this update includes several bug fixes and
improvements for specific dialects such as Redshift, BigQuery, and TSQL.
Overall, this update enhances the performance and functionality of the
`sqlglot` library, ensuring compatibility with the latest version.
* Use cached property for table migration index on local checkout
context ([#3711](#3711)). In
this release, we introduce a new cached property, `_migration_index`, to
the `LocalCheckoutContext` class, designed to store the table migration
index for the local checkout context. This change aims to prevent
multiple recrawling when the migration index is empty. The
`linter_context_factory` method has been refactored to utilize the new
`_migration_index` property, and the `CurrentSessionState` parameter is
removed. Additionally, the `local_code_linter` method has been updated
to leverage the new `LinterContext` instance with the `_migration_index`
property, instead of using the `linter_context_factory` method. The
`LocalCodeLinter` object now accepts a new callable lambda function,
returning a `LinterContext` instance with the `_migration_index`
property. These enhancements improve code performance by reducing the
migration index crawls in the local checkout context and simplify the
code by eliminating the `CurrentSessionState` parameter.
* [DOCS] Explain when to run `remove-workspace-local-backup-groups`
workflow ([#3707](#3707)).
In this release, the UCX component of the application has been enhanced
with new Databricks workflows for orchestrating the group migration
process. The `workflows` command displays the status of the workflows,
and the `repair-run` command allows for rerunning failed workflows. The
group migration workflow is specifically designed to be executed after a
successful assessment workflow, and running it is followed by an
optional `remove-workspace-local-backup-groups` workflow. This final
step removes unnecessary workspace-level backup groups and their
associated permissions, keeping the workspace clean and organized. The
`remove-workspace-local-backup-groups` workflow should only be executed
after confirming the successful migration of all groups involved.

Dependency updates:

* Updated sqlglot requirement from <26.3,>=25.5.0 to >=25.5.0,<26.4
([#3572](#3572)).
* Updated sqlglot requirement from <26.4,>=25.5.0 to >=25.5.0,<26.7
([#3677](#3677)).
Loading branch information
gueniai authored Feb 25, 2025
1 parent 245f6ee commit 05c2d6a
0 comments on commit `05c2d6a`

Please sign in to comment.
Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `05c2d6a`

Commit

There are no files selected for viewing

0 comments on commit 05c2d6a

0 comments on commit `05c2d6a`