-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.36.0 #2635
Merged
Merged
Release v0.36.0 #2635
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Added `upload` and `download` cli commands to `upload` and `download` a file to/from a collection of workspaces ([#2508](#2508)). In this release, the Databricks Labs Unified CLI (Command Line Interface) for UCX (Unified CLI for Workspaces, Clusters, and Tables) has been updated with new `upload` and `download` commands. The `upload` command allows users to upload a file to a single workspace or a collection of workspaces, while the `download` command enables users to download a CSV file from a single workspace or a collection of workspaces. This enhances the efficiency of uploading or downloading the same file to multiple workspaces. Both commands display a warning or information message upon completion, and ensure the file schema is correct before uploading CSV files. This feature includes new methods for uploading and downloading files for multiple workspaces, as well as new unit and integration tests. Users can refer to the contributing instructions to help improve the project. * Added ability to run `create-table-mapping` command as collection ([#2602](#2602)). This PR introduces the capability to run the `create-table-mapping` command as a collection in the `databricks labs ucx` CLI, providing increased flexibility and automation for workflows. A new optional boolean flag, `run-as-collection`, has been added to the `create-table-mapping` command, allowing users to indicate if they want to run it as a collection with a default value of False. The updated `create_table_mapping` function now accepts additional arguments, enabling efficient creation of table mappings for multiple workspaces. Users are encouraged to test this feature in various scenarios and provide feedback for further improvements. * Added comment on the source tables to capture that they have been deprecated ([#2548](#2548)). A new method, `_sql_add_migrated_comment(self, table: Table, target_table_key: str)`, has been added to the `table_migrate.py` file to mark deprecated source tables with a comment indicating their deprecated status and directing users to the new table. This method is currently being used in three existing methods within the same file to add comments to deprecated tables as part of the migration process. In addition, a new SQL query has been added to set a comment on the source table `hive_metastore.db1_src.managed_dbfs`, indicating that it is deprecated and directing users to the new table `ucx_default.db1_dst.managed_dbfs`. A unit test has also been updated to ensure that the migration process correctly adds the deprecation comment to the source table. This change is part of a larger effort to deprecate and migrate data from old tables to new tables and provides guidance for users to migrate to the new table. * Added documentation for PrincipalACl migration and delete-missing-principal cmd ([#2552](#2552)). In this open-source library release, the UCX project has added a new command `delete-missing-principals`, applicable only for AWS, to delete IAM roles created by UCX. This command lists all IAM roles generated by the `principal-prefix-access` command and allows for the selection of multiple roles to delete. It checks if the selected roles are mapped to any storage credentials and seeks confirmation before deleting the role and its associated inline policy. Additionally, updates have been made to the `create-uber-principal` and `migrate-locations` commands to apply location ACLs from existing clusters and grant necessary permissions to users. The `create-catalogs-schemas` command has been updated to apply catalog and schema ACLs from existing clusters for both Azure and AWS. The `migrate-tables` command has also been updated to apply table and view ACLs from existing clusters for both Azure and AWS. The documentation of commands that require admin privileges in the UCX project has also been updated. * Added linting for `spark.sql(...)` calls ([#2558](#2558)). This commit introduces linting for `spark.sql(...)` calls to enhance code quality and consistency by addressing issue [#2558](#2558). The previous SparkSqlPyLinter linter only checked for table migration, but not other SQL linters like DirectFsAccess linters. This has been rectified by incorporating additional SQL linters for `spark.sql(...)` calls, improving the overall linting functionality of the system. The commit also introduces an abstract base class called Fixer, which enforces the inclusion of a `name` property for all derived classes. Additionally, minor improvements and changes have been made to the codebase. The commit resolves issue [#2551](#2551), and updates the testing process in `test_functional.py` to test `spark-sql-directfs.py`, ensuring the proper functioning of the linted `spark.sql(...)` calls. * Document: clarify that the `assessment` job is not intended to be re-run ([#2560](#2560)). In this release, we have updated the behavior of the `assessment` job for Databricks Labs Unity Catalog (UCX) to address confusion around its re-run functionality. Moving forward, the `assessment` job should only be executed once during the initial setup of UCX and should not be re-run to refresh the inventory or findings. If a re-assessment is necessary, UCX will need to be reinstalled first. This change aligns the actual functionality of the `assessment` job and will not affect the daily job that updates parts of the inventory. The `assessment` workflow is designed to detect incompatible entities and provide information for the migration process. It can be executed in parallel or sequentially, and its output is stored in Delta tables for further analysis and decision-making through the assessment report. * Enabled `migrate-credentials` command to run as collection ([#2532](#2532)). In this pull request, the `migrate-credentials` command in the UCX project's CLI has been updated with a new optional flag, `run_as_collection`, which allows the command to operate on multiple workspaces as a collection. This change introduces the `get_contexts` function and modifies the `delete_missing_principals` function to support the new functionality. The `migrate-credentials` command's behavior for Azure and AWS has been updated to accept an additional `acc_client` argument in its tests. Comprehensive tests and documentation have been added to ensure the reliability and robustness of the new functionality. It is recommended to review the attached testing evidence and ensure the new functionality works as intended without introducing any unintended side effects. * Escape column names in target tables of the table migration ([#2563](#2563)). In this release, the `escape_sql_identifier` function in the `utils.py` file has been enhanced with a new `maxsplit` parameter, providing more control over the maximum number of splits performed on the input string. This addresses issue [#2544](#2544) and is part of the existing workflow "-migration-ones". The "tables.py" file in the "databricks/labs/ucx/hive_metastore" directory has been updated to escape column names in target tables, preventing SQL injection attacks. Additionally, a new `ColumnInfo` class and several utility functions have been added to the `fixtures.py` file in the `databricks.labs.ucx` project for generating SQL schemas and column casting. The integration tests for migrating Hive Metastore tables have been updated with new tests to handle column names that require escaping. Lastly, the `test_manager.py` file in the `tests/unit/workspace_access` directory has been refactored by removing the `mock_backend` fixture and adding the `test_inventory_permission_manager_init` method to test the initialization of the `PermissionManager` class. These changes improve security, functionality, and test coverage for software engineers utilizing these libraries in their projects. * Explain why metastore is checked to exists in group migration workflow in docstring ([#2614](#2614)). In the updated `workflows.py` file, the docstring for the `verify_metastore_attached` method has been revised to explain the necessity of checking if a metastore is attached to the workspace. The reason for this check is that account level groups are only available when a metastore is attached, which is crucial for the group migration workflow to function properly. The method itself remains the same, only verifying the presence of a metastore attached to the workspace and causing the workflow to fail if no metastore is found. This modification enhances the clarity of the metastore check's importance in the context of the group migration workflow. * Fixed infinite recursion when visiting a dependency graph ([#2562](#2562)). This change addresses an issue of infinite recursion that can occur when visiting a dependency graph, particularly when many files in a package import the package itself. The `visit` method has been modified to only visit each parent/child pair once, preventing the recursion that can occur in such cases. The `dependencies` property has been added to the DependencyGraph class, and the `DependencyGraphVisitor` class has been introduced to handle visiting nodes and tracking visited pairs. These modifications improve the robustness of the library by preventing infinite recursion during dependency resolution. The change includes added unit tests to ensure correct behavior and addresses a blocker for a previous pull request. The functionality of the code remains unchanged. * Fixed migrate acls CLI command ([#2617](#2617)). In this release, the `migrate acls` command in the ucx project's CLI has been updated to address issue [#2617](#2617). The changes include the removal of ACL type parameters from the `migrate ACL` command, simplifying its usage and eliminating the need for explicit type specifications. The `legacy_table_acl` and `principal` parameters have been removed from the `migrate_acls` function, while the `hms_fed` parameter remains unchanged and retains its default value if not explicitly provided. These modifications streamline the ACL migration process in the ucx CLI, making it easier for users to manage access control lists. * Fixes pip install statement in debug notebook ([#2545](#2545)). In this release, we have addressed an issue in the debug notebook where the pip install statement for wheel was incorrectly surrounded by square brackets, causing the notebook run to fail. We have removed the superfluous square brackets and modified the `remote_wheels` list to be joined as a string before being passed to the DEBUG_NOTEBOOK format. It is important to note that this change solely affects the debug notebook and does not involve any alterations to user documentation, CLI commands, workflows, or tables. Furthermore, no new methods have been added, and existing functionality remains unchanged. The change has been manually tested for accuracy, but it does not include any unit tests, integration tests, or staging environment verification. * More escaping of SQL identifiers ([#2530](#2530)). This commit includes updates to SQL identifier escaping, addressing a missed SQL statement in one of the crawlers and adding support for less-known Spark/Databricks corner cases where backticks in names of identifiers need to be doubled when quoting. The `escape_sql_identifier` function has been modified to consider this new case, and the changes affect the existing `migrate-data-reconciliation` workflow. Additionally, the `TableIdentifier` class has been updated to properly escape identifiers, handling the backticks-in-names scenario. These improvements ensure better handling of SQL identifiers, improving the overall functionality of the codebase. Unit tests have been updated to reflect these changes. * Retry deploy workflow on `InternalError` ([#2525](#2525)). In the 'workflows.py' file, the `_deploy_workflow` function has been updated to include a retry mechanism using the `@retried` decorator, which handles `InternalError` exceptions during workflow creation. This enhancement aims to improve the resilience of deploying workflows by automatically retrying in case of internal errors, thereby addressing issue [#2522](#2522). This change is part of our ongoing efforts to ensure a robust and fault-tolerant deployment process. The retry mechanism is configured with a timeout of 2 minutes to prevent extended waiting in case of persistent issues, thus enhancing overall system efficiency and reliability. * Updated databricks-labs-lsql requirement from <0.10,>=0.5 to >=0.5,<0.11 ([#2580](#2580)). In this release, we have updated the requirement for the databricks-labs-lsql package to version 0.10 or lower, with an upper limit of 0.11. Previously, the package version was constrained to be greater than or equal to 0.5 and less than 0.10. This update will allow users to utilize the latest version of the package, which includes new features and bug fixes. For more detailed information on the changes included in this update, please refer to the changelog and release notes provided in the commit message. * Updated sqlglot requirement from <25.20,>=25.5.0 to >=25.5.0,<25.21 ([#2549](#2549)). In this pull request, we are updating the sqlglot requirement in the pyproject.toml file from a range of >=25.5.0,<25.20 to >=25.5.0,<25.21. This change allows for the installation of the latest version of sqlglot, while ensuring that the version does not exceed 25.21. The update was made in response to a pull request from Dependabot, which identified a new version of sqlglot. The PR includes details of the sqlglot changelog and commits, but as reviewers, we can focus on the specific change made to our project. The sqlglot package is a SQL parser and transpiler that we use as a dependency in this project. This update will ensure that our project is using the latest version of this package, which may include bug fixes, new features, or improvements in performance. Dependency updates: * Updated sqlglot requirement from <25.20,>=25.5.0 to >=25.5.0,<25.21 ([#2549](#2549)). * Updated databricks-labs-lsql requirement from <0.10,>=0.5 to >=0.5,<0.11 ([#2580](#2580)).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
upload
anddownload
cli commands toupload
anddownload
a file to/from a collection of workspaces (#2508). In this release, the Databricks Labs Unified CLI (Command Line Interface) for UCX (Unified CLI for Workspaces, Clusters, and Tables) has been updated with newupload
anddownload
commands. Theupload
command allows users to upload a file to a single workspace or a collection of workspaces, while thedownload
command enables users to download a CSV file from a single workspace or a collection of workspaces. This enhances the efficiency of uploading or downloading the same file to multiple workspaces. Both commands display a warning or information message upon completion, and ensure the file schema is correct before uploading CSV files. This feature includes new methods for uploading and downloading files for multiple workspaces, as well as new unit and integration tests. Users can refer to the contributing instructions to help improve the project.create-table-mapping
command as collection (#2602). This PR introduces the capability to run thecreate-table-mapping
command as a collection in thedatabricks labs ucx
CLI, providing increased flexibility and automation for workflows. A new optional boolean flag,run-as-collection
, has been added to thecreate-table-mapping
command, allowing users to indicate if they want to run it as a collection with a default value of False. The updatedcreate_table_mapping
function now accepts additional arguments, enabling efficient creation of table mappings for multiple workspaces. Users are encouraged to test this feature in various scenarios and provide feedback for further improvements._sql_add_migrated_comment(self, table: Table, target_table_key: str)
, has been added to thetable_migrate.py
file to mark deprecated source tables with a comment indicating their deprecated status and directing users to the new table. This method is currently being used in three existing methods within the same file to add comments to deprecated tables as part of the migration process. In addition, a new SQL query has been added to set a comment on the source tablehive_metastore.db1_src.managed_dbfs
, indicating that it is deprecated and directing users to the new tableucx_default.db1_dst.managed_dbfs
. A unit test has also been updated to ensure that the migration process correctly adds the deprecation comment to the source table. This change is part of a larger effort to deprecate and migrate data from old tables to new tables and provides guidance for users to migrate to the new table.delete-missing-principals
, applicable only for AWS, to delete IAM roles created by UCX. This command lists all IAM roles generated by theprincipal-prefix-access
command and allows for the selection of multiple roles to delete. It checks if the selected roles are mapped to any storage credentials and seeks confirmation before deleting the role and its associated inline policy. Additionally, updates have been made to thecreate-uber-principal
andmigrate-locations
commands to apply location ACLs from existing clusters and grant necessary permissions to users. Thecreate-catalogs-schemas
command has been updated to apply catalog and schema ACLs from existing clusters for both Azure and AWS. Themigrate-tables
command has also been updated to apply table and view ACLs from existing clusters for both Azure and AWS. The documentation of commands that require admin privileges in the UCX project has also been updated.spark.sql(...)
calls (#2558). This commit introduces linting forspark.sql(...)
calls to enhance code quality and consistency by addressing issue #2558. The previous SparkSqlPyLinter linter only checked for table migration, but not other SQL linters like DirectFsAccess linters. This has been rectified by incorporating additional SQL linters forspark.sql(...)
calls, improving the overall linting functionality of the system. The commit also introduces an abstract base class called Fixer, which enforces the inclusion of aname
property for all derived classes. Additionally, minor improvements and changes have been made to the codebase. The commit resolves issue #2551, and updates the testing process intest_functional.py
to testspark-sql-directfs.py
, ensuring the proper functioning of the lintedspark.sql(...)
calls.assessment
job is not intended to be re-run (#2560). In this release, we have updated the behavior of theassessment
job for Databricks Labs Unity Catalog (UCX) to address confusion around its re-run functionality. Moving forward, theassessment
job should only be executed once during the initial setup of UCX and should not be re-run to refresh the inventory or findings. If a re-assessment is necessary, UCX will need to be reinstalled first. This change aligns the actual functionality of theassessment
job and will not affect the daily job that updates parts of the inventory. Theassessment
workflow is designed to detect incompatible entities and provide information for the migration process. It can be executed in parallel or sequentially, and its output is stored in Delta tables for further analysis and decision-making through the assessment report.migrate-credentials
command to run as collection (#2532). In this pull request, themigrate-credentials
command in the UCX project's CLI has been updated with a new optional flag,run_as_collection
, which allows the command to operate on multiple workspaces as a collection. This change introduces theget_contexts
function and modifies thedelete_missing_principals
function to support the new functionality. Themigrate-credentials
command's behavior for Azure and AWS has been updated to accept an additionalacc_client
argument in its tests. Comprehensive tests and documentation have been added to ensure the reliability and robustness of the new functionality. It is recommended to review the attached testing evidence and ensure the new functionality works as intended without introducing any unintended side effects.escape_sql_identifier
function in theutils.py
file has been enhanced with a newmaxsplit
parameter, providing more control over the maximum number of splits performed on the input string. This addresses issue #2544 and is part of the existing workflow "-migration-ones". The "tables.py" file in the "databricks/labs/ucx/hive_metastore" directory has been updated to escape column names in target tables, preventing SQL injection attacks. Additionally, a newColumnInfo
class and several utility functions have been added to thefixtures.py
file in thedatabricks.labs.ucx
project for generating SQL schemas and column casting. The integration tests for migrating Hive Metastore tables have been updated with new tests to handle column names that require escaping. Lastly, thetest_manager.py
file in thetests/unit/workspace_access
directory has been refactored by removing themock_backend
fixture and adding thetest_inventory_permission_manager_init
method to test the initialization of thePermissionManager
class. These changes improve security, functionality, and test coverage for software engineers utilizing these libraries in their projects.workflows.py
file, the docstring for theverify_metastore_attached
method has been revised to explain the necessity of checking if a metastore is attached to the workspace. The reason for this check is that account level groups are only available when a metastore is attached, which is crucial for the group migration workflow to function properly. The method itself remains the same, only verifying the presence of a metastore attached to the workspace and causing the workflow to fail if no metastore is found. This modification enhances the clarity of the metastore check's importance in the context of the group migration workflow.visit
method has been modified to only visit each parent/child pair once, preventing the recursion that can occur in such cases. Thedependencies
property has been added to the DependencyGraph class, and theDependencyGraphVisitor
class has been introduced to handle visiting nodes and tracking visited pairs. These modifications improve the robustness of the library by preventing infinite recursion during dependency resolution. The change includes added unit tests to ensure correct behavior and addresses a blocker for a previous pull request. The functionality of the code remains unchanged.migrate acls
command in the ucx project's CLI has been updated to address issue #2617. The changes include the removal of ACL type parameters from themigrate ACL
command, simplifying its usage and eliminating the need for explicit type specifications. Thelegacy_table_acl
andprincipal
parameters have been removed from themigrate_acls
function, while thehms_fed
parameter remains unchanged and retains its default value if not explicitly provided. These modifications streamline the ACL migration process in the ucx CLI, making it easier for users to manage access control lists.remote_wheels
list to be joined as a string before being passed to the DEBUG_NOTEBOOK format. It is important to note that this change solely affects the debug notebook and does not involve any alterations to user documentation, CLI commands, workflows, or tables. Furthermore, no new methods have been added, and existing functionality remains unchanged. The change has been manually tested for accuracy, but it does not include any unit tests, integration tests, or staging environment verification.escape_sql_identifier
function has been modified to consider this new case, and the changes affect the existingmigrate-data-reconciliation
workflow. Additionally, theTableIdentifier
class has been updated to properly escape identifiers, handling the backticks-in-names scenario. These improvements ensure better handling of SQL identifiers, improving the overall functionality of the codebase. Unit tests have been updated to reflect these changes.InternalError
(#2525). In the 'workflows.py' file, the_deploy_workflow
function has been updated to include a retry mechanism using the@retried
decorator, which handlesInternalError
exceptions during workflow creation. This enhancement aims to improve the resilience of deploying workflows by automatically retrying in case of internal errors, thereby addressing issue #2522. This change is part of our ongoing efforts to ensure a robust and fault-tolerant deployment process. The retry mechanism is configured with a timeout of 2 minutes to prevent extended waiting in case of persistent issues, thus enhancing overall system efficiency and reliability.Dependency updates: