Refactor `JobsCrawler` to add `include_job_ids` #3658

mohanab-db · 2025-02-06T18:54:44Z

Changes

Check for include_job_ids in assess_jobs task

Linked issues

#3656

Resolves #..

Functionality

added relevant user documentation
added new CLI command
modified existing command: databricks labs ucx ...
added a new workflow
modified existing workflow: ...
added a new table
modified existing table: ...

Tests

manually tested
added unit tests
added integration tests
verified on staging environment (screenshot attached)

pritishpai · 2025-02-06T19:02:24Z

Needs an integration test: https://github.com/databrickslabs/ucx/pull/3657/files#diff-1dca10e5526032a0b818ce12b0a73066355672aa765216e8a3caf9e0e099c5aa

github-actions · 2025-02-06T19:11:47Z

✅ 29/29 passed, 5 skipped, 38m22s total

_{Running from acceptance #8227}

JCZuurmond

@mohanab-db : Thank you for reporting this issue and resolving it!

I have added some minor comments. A more fundamental implementation question would be: what is the bottle neck when crawling all the jobs?

Is the bottleneck:

Listing all the jobs
Listing all clusters
Combining jobs with clusters
Assessing the jobs (actually we only assess the clusters related to the jobs)

JCZuurmond · 2025-02-07T06:46:10Z

src/databricks/labs/ucx/assessment/jobs.py

@@ -94,9 +94,10 @@ def _job_clusters(job: BaseJob) -> Iterable[tuple[BaseJob, ClusterSpec]]:


 class JobsCrawler(CrawlerBase[JobInfo], JobsMixin, CheckClusterMixin):
-    def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema):
+    def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, include_job_ids: list[int] | None = None):


Suggested change

def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, include_job_ids: list[int] | None = None):

def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, *, include_job_ids: list[int] | None = None):

tests/integration/assessment/test_jobs.py

Check include_job_ids in assess_jobs

382a689

mohanab-db requested a review from a team as a code owner February 6, 2025 18:54

mohanab-db temporarily deployed to account-admin February 6, 2025 18:54 — with GitHub Actions Inactive

pritishpai changed the title ~~Check include_job_ids in assess_jobs~~ Refactor JobsCrawler to add include_job_ids Feb 6, 2025

Add new parameter to jobs_crawler property

9feaaa8

pritishpai had a problem deploying to account-admin February 6, 2025 21:03 — with GitHub Actions Error

Add test to ensure all jobs are not crawled

f1f2408

pritishpai temporarily deployed to account-admin February 6, 2025 21:04 — with GitHub Actions Inactive

pritishpai self-assigned this Feb 6, 2025

pritishpai requested review from JCZuurmond and FastLee February 6, 2025 21:05

JCZuurmond reviewed Feb 7, 2025

View reviewed changes

Remove unused config

6599a9a

pritishpai deployed to account-admin February 7, 2025 14:50 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `JobsCrawler` to add `include_job_ids` #3658

Refactor `JobsCrawler` to add `include_job_ids` #3658

mohanab-db commented Feb 6, 2025

pritishpai commented Feb 6, 2025

github-actions bot commented Feb 6, 2025 •

edited

Loading

JCZuurmond left a comment

JCZuurmond Feb 7, 2025

	def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, include_job_ids: list[int] \| None = None):
	def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, *, include_job_ids: list[int] \| None = None):

Refactor JobsCrawler to add include_job_ids #3658

Are you sure you want to change the base?

Refactor JobsCrawler to add include_job_ids #3658

Conversation

mohanab-db commented Feb 6, 2025

Changes

Linked issues

Functionality

Tests

pritishpai commented Feb 6, 2025

github-actions bot commented Feb 6, 2025 • edited Loading

JCZuurmond left a comment

Choose a reason for hiding this comment

JCZuurmond Feb 7, 2025

Choose a reason for hiding this comment

Refactor `JobsCrawler` to add `include_job_ids` #3658

Refactor `JobsCrawler` to add `include_job_ids` #3658

github-actions bot commented Feb 6, 2025 •

edited

Loading