Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dashboard for tracking migration progress #3016

Merged
merged 119 commits into from
Oct 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
119 commits
Select commit Hold shift + click to select a range
61582df
Add integration test folder for queries
JCZuurmond Oct 18, 2024
b25a67d
Add dummy query for testing migration progress dashboard
JCZuurmond Oct 18, 2024
c3ade18
Add integration test for migration progress dashboard
JCZuurmond Oct 18, 2024
c559d9c
Remove kw_only from historical
JCZuurmond Oct 18, 2024
568dccb
Add counter for percentage successfully migrated resources
JCZuurmond Oct 18, 2024
40e3cca
Remove dummy query
JCZuurmond Oct 18, 2024
d8f64ff
Add tables to populated schema
JCZuurmond Oct 18, 2024
6722b47
Move dashboard metadata to fixture
JCZuurmond Oct 18, 2024
897c61d
Test the percentage migration readiness query
JCZuurmond Oct 18, 2024
8bb8d5a
Add historical records mock data
JCZuurmond Oct 18, 2024
f0e4771
Add TODO
JCZuurmond Oct 18, 2024
72a8b72
Add query for migration status by owner
JCZuurmond Oct 18, 2024
0735784
Add failures to migration status
JCZuurmond Oct 18, 2024
8dcc80a
Fix query
JCZuurmond Oct 18, 2024
6e1bf3f
Move query testing to parametrize
JCZuurmond Oct 18, 2024
a025860
Test migration status by owner
JCZuurmond Oct 18, 2024
5c9915c
Update the migration status by owner query
JCZuurmond Oct 18, 2024
89f8ad7
Add query for bargraph
JCZuurmond Oct 18, 2024
4a766a0
Add overrides to visualize bar graph
JCZuurmond Oct 18, 2024
9bbd22c
Extend test data for more intersting dashboard
JCZuurmond Oct 18, 2024
1b94caf
Fix typo
JCZuurmond Oct 18, 2024
418a823
Add counter showing table migration readiness
JCZuurmond Oct 18, 2024
94d3b0e
Add migration readiness paragraph
JCZuurmond Oct 18, 2024
0e2ceba
Add general migration progress readme
JCZuurmond Oct 18, 2024
1a33df5
Add migration status section
JCZuurmond Oct 18, 2024
985970d
Publish dashboard
JCZuurmond Oct 18, 2024
eca1590
Add owner filter
JCZuurmond Oct 18, 2024
79a5b5b
Format
JCZuurmond Oct 18, 2024
ae1c0e0
Filter readiness for not migration status
JCZuurmond Oct 18, 2024
9c88b87
Mock UDFs
JCZuurmond Oct 21, 2024
69f4793
Use multiworkspace in migration progress dashboard queries
JCZuurmond Oct 21, 2024
c98b112
Add query showing UDF migration readiness
JCZuurmond Oct 21, 2024
8ccc593
Fix parsing failures for udfs fixture
JCZuurmond Oct 21, 2024
ef4bdf9
Remove redundant descriptions from migration readiness counters
JCZuurmond Oct 21, 2024
6990612
Add grants mock
JCZuurmond Oct 21, 2024
9a723dc
Increase height of dashboard header
JCZuurmond Oct 21, 2024
ada1cb4
Add grant migration readiness query
JCZuurmond Oct 21, 2024
a09c102
Add jobs fixture
JCZuurmond Oct 21, 2024
240cf20
Add widget for job migration percentage
JCZuurmond Oct 21, 2024
951e5cd
Try override percentage
JCZuurmond Oct 21, 2024
40269ca
Remove percentage override
JCZuurmond Oct 21, 2024
4e1f17f
Add clusters fixture
JCZuurmond Oct 21, 2024
caafc38
Add cluster migration readiness widget
JCZuurmond Oct 21, 2024
45cfe39
Add missing percentage symbols
JCZuurmond Oct 21, 2024
5afedd0
Shorten comment
JCZuurmond Oct 21, 2024
629b2a8
Add docstring to schema populated
JCZuurmond Oct 21, 2024
8867e1b
Add fixture for pipelines
JCZuurmond Oct 21, 2024
42ea7a2
Add counter for pipelines
JCZuurmond Oct 21, 2024
c334763
Add fixture for policies
JCZuurmond Oct 21, 2024
fbf99dc
Add policy migration counter
JCZuurmond Oct 21, 2024
f38ab38
Update overall readiness size
JCZuurmond Oct 21, 2024
b08ce31
Reorder counters
JCZuurmond Oct 21, 2024
e3d58e4
Update text
JCZuurmond Oct 21, 2024
9487930
Fix widget rename in test
JCZuurmond Oct 21, 2024
2b87616
Update migration readiness text
JCZuurmond Oct 21, 2024
a89fbd3
Update migration status text
JCZuurmond Oct 21, 2024
72b5733
Format
JCZuurmond Oct 21, 2024
28fe5f0
Move historical objects to fixture
JCZuurmond Oct 21, 2024
e5ff729
Rename variables
JCZuurmond Oct 21, 2024
1b1c757
Avoid code duplication
JCZuurmond Oct 21, 2024
55d21ba
Replace the UCX catalog
JCZuurmond Oct 22, 2024
b4ee5ce
Add UCX catalog to progress queries
JCZuurmond Oct 22, 2024
8184ab6
Move migration progress dashboard into subfolder `main`
JCZuurmond Oct 22, 2024
f60b2ec
Add note about creating UCX catalog
JCZuurmond Oct 22, 2024
dc1ef84
Filter owner without failures
JCZuurmond Oct 22, 2024
bc5e5b3
Format
JCZuurmond Oct 22, 2024
0cc7ff0
Make object type filter inclusive instead of exclusive
JCZuurmond Oct 23, 2024
bd4fb00
Fix expected query outcomes
JCZuurmond Oct 23, 2024
7175590
Add workflow run fixtures
JCZuurmond Oct 23, 2024
ea4afdb
Add view to get the latest historical records
JCZuurmond Oct 23, 2024
6b4cdc6
Deploy latest_historical_per_workspace view in progress tracking inst…
JCZuurmond Oct 23, 2024
4d94ed1
Run installation progress installer before deploying dashboard
JCZuurmond Oct 23, 2024
dfba275
Fix column reference
JCZuurmond Oct 23, 2024
41b135b
Use latest historical per workspace in queries
JCZuurmond Oct 23, 2024
c0de729
Add type hint
JCZuurmond Oct 23, 2024
acead38
Make historical key word only
JCZuurmond Oct 23, 2024
c074587
Add counters for migrated and pending migration data objects
JCZuurmond Oct 23, 2024
7fa35be
Remove redundant colon
JCZuurmond Oct 23, 2024
2686605
Fix references to queries
JCZuurmond Oct 23, 2024
d58c9b0
Make columns in view explicit
JCZuurmond Oct 23, 2024
ad372f6
Round percentages
JCZuurmond Oct 23, 2024
e3de5d2
Format save table
JCZuurmond Oct 23, 2024
ef346ac
Rewrite for loop
JCZuurmond Oct 23, 2024
43fcee2
Ceil instead of round
JCZuurmond Oct 23, 2024
039cfbf
Try encoder
JCZuurmond Oct 24, 2024
f00d112
Add failures to migration status
JCZuurmond Oct 24, 2024
cde0199
Use try divide
JCZuurmond Oct 24, 2024
e0f0503
Remove failures from table migration status
JCZuurmond Oct 28, 2024
593f189
Remove table migration progress fixture
JCZuurmond Oct 28, 2024
9826bb3
Write out historical
JCZuurmond Oct 28, 2024
3a13bac
Remove failures from fixtures
JCZuurmond Oct 28, 2024
5948119
Update catalog populated fixture
JCZuurmond Oct 28, 2024
596565b
Fix cached job run id
JCZuurmond Oct 28, 2024
5c78f8a
Fix query filters
JCZuurmond Oct 28, 2024
7159028
Format
JCZuurmond Oct 28, 2024
c69525b
Set mocked tables ownership through grants
JCZuurmond Oct 28, 2024
52f0829
Add comment
JCZuurmond Oct 28, 2024
c039cdc
Remove redundant fixture
JCZuurmond Oct 28, 2024
6b8b53b
Add comment about skipping test in debug
JCZuurmond Oct 28, 2024
9c4d429
Rename readiness to progress
JCZuurmond Oct 28, 2024
e8f524a
Fix query filter
JCZuurmond Oct 28, 2024
db0fb11
Rename test queries
JCZuurmond Oct 28, 2024
9c577c0
Fix failures
JCZuurmond Oct 28, 2024
e242a47
Add distinct failures per object type
JCZuurmond Oct 28, 2024
6cf0046
Use fractions to calculate percentages in tests
JCZuurmond Oct 28, 2024
4925da1
Fix typo
JCZuurmond Oct 28, 2024
d617f56
Use migration status to log tables migrated
JCZuurmond Oct 28, 2024
698fdcc
Fix collision table name
JCZuurmond Oct 28, 2024
87108a6
Add type hints
JCZuurmond Oct 28, 2024
68c52e4
Disable too many arguments
JCZuurmond Oct 28, 2024
d8fb518
Capatilize SIZE
JCZuurmond Oct 28, 2024
0c7b385
Capatilize PARTITION BY
JCZuurmond Oct 28, 2024
e0e7132
Rename latest_historical_per_workspace to objects_snapshot
JCZuurmond Oct 28, 2024
bdad1c7
Fix filter in query
JCZuurmond Oct 28, 2024
74e036b
Invert fractions
JCZuurmond Oct 28, 2024
bc3e90d
Fix failure
JCZuurmond Oct 28, 2024
ffca9d2
Choose supported grant
JCZuurmond Oct 28, 2024
a4546e5
Add missing DENY grant
JCZuurmond Oct 28, 2024
2b50f9b
Cast percentage as double for testing
JCZuurmond Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions src/databricks/labs/ucx/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -634,9 +634,16 @@ def _handle_existing_dashboard(self, dashboard_id: str, display_name: str, paren
def _create_dashboard(self, folder: Path, *, parent_path: str) -> None:
"""Create a lakeview dashboard from the SQL queries in the folder"""
logger.info(f"Creating dashboard in {folder}...")
metadata = DashboardMetadata.from_path(folder).replace_database(
database=f"hive_metastore.{self._config.inventory_database}",
database_to_replace="inventory",
metadata = (
DashboardMetadata.from_path(folder)
.replace_database( # Assessment and migration dashboards
database=f"hive_metastore.{self._config.inventory_database}",
database_to_replace="inventory",
)
.replace_database( # Migration progress dashboard
catalog=self._config.ucx_catalog,
catalog_to_replace="ucx_catalog",
)
)
metadata.display_name = f"{self._name('UCX ')} {folder.parent.stem.title()} ({folder.stem.title()})"
reference = f"{folder.parent.stem}_{folder.stem}".lower()
Expand Down
6 changes: 4 additions & 2 deletions src/databricks/labs/ucx/progress/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,15 @@ class ProgressTrackingInstallation:
_SCHEMA = "multiworkspace"

def __init__(self, sql_backend: SqlBackend, ucx_catalog: str) -> None:
# `mod` is a required parameter, though, it's not used in this context without views.
self._schema_deployer = SchemaDeployer(sql_backend, self._SCHEMA, mod=None, catalog=ucx_catalog)
from databricks.labs import ucx # pylint: disable=import-outside-toplevel

self._schema_deployer = SchemaDeployer(sql_backend, self._SCHEMA, mod=ucx, catalog=ucx_catalog)

def run(self) -> None:
self._schema_deployer.deploy_schema()
self._schema_deployer.deploy_table("workflow_runs", WorkflowRun)
self._schema_deployer.deploy_table("historical", Historical)
self._schema_deployer.deploy_view("objects_snapshot", "queries/views/objects_snapshot.sql")
logger.info("Installation completed successfully!")


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
height: 4
---

# Migration Progress

> If widgets show `Unable to render visualization.` verify if
> the [UCX catalog exists](https://github.com/databrickslabs/ucx?tab=readme-ov-file#create-ucx-catalog-command).

This dashboard displays the migration progress, with data visualized from the `migration-progress-experimental`
workflow. This workflow is designed to run regularly — either daily or weekly — to provide an up-to-date overview of the
migration progress.

In addition to offering real-time insights into migration progress, the dashboard also facilitates planning and task
division. For instance, you can choose to migrate one workspace or schema at a time. By assigning a migration owner to
each workspace and/or schema, the dashboard shows how the resources allocated to that owner are progressing.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'Overall progress (%)' --width 2 */
SELECT
ROUND(100 * try_divide(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type IN ('ClusterInfo', 'Grant', 'JobInfo', 'PipelineInfo', 'PolicyInfo', 'Table', 'Udf')
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'UDF migration progress (%)' */
SELECT
ROUND(100 * TRY_DIVIDE(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = "Udf"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'Grant migration progress (%)' */
SELECT
ROUND(100 * TRY_DIVIDE(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = "Grant"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'Job migration progress (%)' */
SELECT
ROUND(100 * TRY_DIVIDE(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = "JobInfo"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'Cluster migration progress (%)' */
SELECT
ROUND(100 * TRY_DIVIDE(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = "ClusterInfo"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'Table migration progress (%)' --width 2 */
SELECT
ROUND(100 * TRY_DIVIDE(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = "Table"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'Pipeline migration progress (%)' */
SELECT
ROUND(100 * TRY_DIVIDE(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = "PipelineInfo"
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
/* --title 'Policy migration progress (%)' */
SELECT
ROUND(100 * TRY_DIVIDE(COUNT_IF(SIZE(failures) = 0), COUNT(*)), 2) AS percentage
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = "PolicyInfo"
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
/* --title 'Distinct failures per object type' --width 6 */
with failures AS (
SELECT object_type, explode(failures) AS failure
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type IN ('ClusterInfo', 'Grant', 'JobInfo', 'PipelineInfo', 'PolicyInfo', 'Table', 'Udf')
)

SELECT
object_type,
COUNT(*) AS count,
failure
FROM failures
GROUP BY object_type, failure
ORDER BY object_type, failure
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Tables and Views

This section presents the migration progress of tables and views, detailing which data objects are migrated and which
are pending migration.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
title: Filter for owner(s)
column: owner
type: MULTI_SELECT
width: 6
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
/* --title 'Pending migration' --description 'Total number of tables and views' --height 6 */
SELECT COUNT(*) AS count
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = 'Table' AND array_contains(failures, 'Pending migration')
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
--title 'Pending migration'
--description 'Tables and views per owner'
--width 5
--overrides '{"spec": {
"version": 3,
"widgetType": "bar",
"encodings": {
"x":{"fieldName": "owner", "scale": {"type": "categorical"}, "displayName": "owner"},
"y":{"fieldName": "count", "scale": {"type": "quantitative"}, "displayName": "count"}
}
}}'
*/
WITH owners_with_failures AS (
SELECT owner
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = 'Table' AND array_contains(failures, 'Pending migration')
)

SELECT
owner,
COUNT(1) AS count
FROM owners_with_failures
GROUP BY owner
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
/* --title 'Migrated' --description 'Total number of tables and views' --height 6 */
SELECT COUNT(*) AS count
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = 'Table' AND SIZE(failures) == 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/* --title 'Overview' --description 'Tables and views migration' --width 5 */
WITH migration_statuses AS (
SELECT *
FROM ucx_catalog.multiworkspace.objects_snapshot
WHERE object_type = 'Table'
)

SELECT
owner,
DOUBLE(CEIL(100 * COUNT_IF(SIZE(failures) = 0) / SUM(COUNT(*)) OVER (PARTITION BY owner), 2)) AS percentage,
COUNT(*) AS total,
COUNT_IF(SIZE(failures) = 0) AS total_migrated,
COUNT_IF(SIZE(failures) > 0) AS total_not_migrated
FROM migration_statuses
GROUP BY owner
25 changes: 25 additions & 0 deletions src/databricks/labs/ucx/queries/views/objects_snapshot.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
WITH last_workflow_run AS (
SELECT
workspace_id,
MAX(STRUCT(finished_at, workflow_run_attempt, started_at, workflow_run_id)) AS max_struct
FROM $inventory.workflow_runs -- $inventory is a hardcoded name for replacing target schema in a view definition
WHERE workflow_name = 'migration-progress-experimental'
GROUP BY workspace_id
)

SELECT
historical.workspace_id,
historical.job_run_id,
historical.object_type,
historical.object_id,
historical.data,
historical.failures,
historical.owner,
historical.ucx_version
FROM
$inventory.historical AS historical -- $inventory is a hardcoded name for replacing target schema in a view definition
JOIN
last_workflow_run
ON
historical.workspace_id = last_workflow_run.workspace_id
AND historical.job_run_id = last_workflow_run.max_struct.workflow_run_id
Empty file.
Loading
Loading