Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor JobsCrawler to add include_job_ids #3658

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mohanab-db
Copy link
Contributor

Changes

Check for include_job_ids in assess_jobs task

Linked issues

#3656

Resolves #..

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs ucx ...
  • added a new workflow
  • modified existing workflow: ...
  • added a new table
  • modified existing table: ...

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • verified on staging environment (screenshot attached)

@mohanab-db mohanab-db requested a review from a team as a code owner February 6, 2025 18:54
@pritishpai
Copy link
Contributor

@pritishpai pritishpai changed the title Check include_job_ids in assess_jobs Refactor JobsCrawler to add include_job_ids Feb 6, 2025
Copy link

github-actions bot commented Feb 6, 2025

✅ 29/29 passed, 5 skipped, 38m22s total

Running from acceptance #8227

@pritishpai pritishpai self-assigned this Feb 6, 2025
Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohanab-db : Thank you for reporting this issue and resolving it!

I have added some minor comments. A more fundamental implementation question would be: what is the bottle neck when crawling all the jobs?

Is the bottleneck:

  • Listing all the jobs
  • Listing all clusters
  • Combining jobs with clusters
  • Assessing the jobs (actually we only assess the clusters related to the jobs)

@@ -94,9 +94,10 @@ def _job_clusters(job: BaseJob) -> Iterable[tuple[BaseJob, ClusterSpec]]:


class JobsCrawler(CrawlerBase[JobInfo], JobsMixin, CheckClusterMixin):
def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema):
def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, include_job_ids: list[int] | None = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, include_job_ids: list[int] | None = None):
def __init__(self, ws: WorkspaceClient, sql_backend: SqlBackend, schema, *, include_job_ids: list[int] | None = None):

tests/integration/assessment/test_jobs.py Outdated Show resolved Hide resolved
@pritishpai pritishpai deployed to account-admin February 7, 2025 14:50 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants