Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Node Jobs creation leads to Missing required field: settings.cluster_spec.new_cluster.size Databricks CLI 0.200.2 #663

Open
zyeiy2 opened this issue Aug 1, 2023 · 6 comments

Comments

@zyeiy2
Copy link

zyeiy2 commented Aug 1, 2023

Setup

  1. Creating a Simple Job via Databricks UI and exporting the Job definition Json
  2. Create the job via Databricks CLI with the previous Job definition Json

Description
Azure Databricks CLI logged several events related to a process involving the creation of a job on Databricks. The Azure Databricks CLI on Linux version used was 0.200.2.

The DEFAULT profile from "/root/.databrickscfg" was used for authentication and correct configured because the other operations work well. In the .databrickscfg API Verison 2.1 and 2.0 were already tested and lead to the same result.

However, the job creation failed due to a missing required field in the cluster specification: "settings.cluster_spec.new_cluster.size." As a result, the CLI returned an error with the code "INVALID_PARAMETER_VALUE" and the message "Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size."

Running this command lead to the error above:
databricks jobs create --json @b.json

Find the content of the b.json file below.

{
    "run_as": {
        "user_name": "[email protected]"
    },
    "name": "UC Test",
    "email_notifications": {
        "no_alert_for_skipped_runs": false
    },
    "webhook_notifications": {},
    "timeout_seconds": 0,
    "max_concurrent_runs": 1,
    "tasks": [
        {
            "task_key": "Task",
            "notebook_task": {
                "notebook_path": "/Shared/z-job/NB_Start_Job",
                "source": "WORKSPACE"
            },
            "job_cluster_key": "Job_cluster",
            "timeout_seconds": 0,
            "email_notifications": {}
        }
    ],
    "job_clusters": [
        {
            "job_cluster_key": "Job_cluster",
            "new_cluster": {
                "spark_version": "12.2.x-scala2.12",
                "spark_conf": {
                    "spark.databricks.delta.preview.enabled": "true",
                    "spark.master": "local[*, 4]",
                    "spark.databricks.cluster.profile": "singleNode"
                },
                "azure_attributes": {
                    "first_on_demand": 1,
                    "availability": "ON_DEMAND_AZURE",
                    "spot_bid_max_price": -1
                },
                "node_type_id": "Standard_DS3_v2",
                "custom_tags": {
                    "ResourceClass": "SingleNode"
                },
                "spark_env_vars": {
                    "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
                },
                "enable_elastic_disk": true,
                "data_security_mode": "SINGLE_USER",
                "runtime_engine": "STANDARD",
                "num_workers": 0
            }
        }
    ],
    "format": "MULTI_TASK"
}

The log of the command executed can be found here

And leads also to the error:
time=2023-08-01T07:52:58.409+02:00 level=ERROR source=root.go:96 msg="failed execution" exit_code=1 error="Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size"

@alexott
Copy link
Contributor

alexott commented Aug 1, 2023

Please report it to https://github.com/databricks/cli

@zyeiy2
Copy link
Author

zyeiy2 commented Aug 1, 2023

Please report it to https://github.com/databricks/cli

Hey @alexott thank you for the hint. It was hard to determine what was the right place to post.

Issue is already known, Failed to create a job with single-node cluster

@dvinesett
Copy link

I have a workaround until the fix arrives. Cross posting it in this old CLI repo since the issue is still open. I experienced the issue with reset rather than create, but it seems that the bug is shared. You can opt to hit the databricks api command group instead of databricks jobs. As an example:

Not working:

$ databricks jobs reset --json @example.json
Error: Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size

Working:

$ databricks api post /api/2.1/jobs/reset --json @example.json
{}

This was done in Databricks CLI v0.205.0.

@kucm-pg
Copy link

kucm-pg commented Mar 4, 2024

Thanks @dvinesett for sharing the workaround. Works like a charm, even though it complicates my CD pipeline.
It's saddening to see that this issue got no traction with Databrick's dev team.

@dvinesett
Copy link

Hey @kucm-pg, this issue was posted to the Legacy Databricks CLI and likely won't be fixed here. However, it was fixed in the newer, separate version of the Databricks CLI. You'll need to migrate to that version to receive the update. That is, installing the CLI from the instructions here rather than the old way using pip.

@kucm-pg
Copy link

kucm-pg commented Mar 4, 2024

Hey @dvinesett thanks for sharing! This issue can be closed, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants