Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libraries-json configuration not working #26

Open
JamesBorg opened this issue Nov 3, 2022 · 3 comments
Open

libraries-json configuration not working #26

JamesBorg opened this issue Nov 3, 2022 · 3 comments

Comments

@JamesBorg
Copy link

Trying to have a library installed with the created cluster but am running into the following error.

Error: {"error_code":"MALFORMED_REQUEST","message":"Could not parse request object: Expected 'START_OBJECT' not 'VALUE_STRING'\n at [Source: (ByteArrayInputStream); line: 1, column: 405]\n at [Source: java.io.ByteArrayInputStream@c6f971a; line: 1, column: 405]"}

Here is a copy of the workflow configuration:

name: Databricks notebook running test
on:
  workflow_dispatch:
  push:

env:
  DATABRICKS_HOST: https://******************.azuredatabricks.net
  NODE_TYPE_ID: Standard_NC6s_v3
  GITHUB_TOKEN: ${{ secrets.REPO_TOKEN }}

jobs:
  databricks_notebook_test:
    runs-on: ubuntu-20.04
    steps:
      - name: Checkout repo
        uses: actions/checkout@v3
      - name: Generate AAD Token
        run: ./.github/workflows/scripts/generate-aad-token.sh ${{ secrets.AZURE_SP_TENANT_ID }} ${{ secrets.AZURE_SP_APPLICATION_ID }} ${{ secrets.AZURE_SP_CLIENT_SECRET }}
      - name: Train model
        uses: databricks/run-notebook@v0
        id: train
        with:
          local-notebook-path: notebooks/test.py
          git-commit: ${{ github.event.pull_request.head.sha || github.sha}}
          libraries-json: >
            [
              { "pypi": "accelerate" }
            ]
          new-cluster-json: >
            {
              "spark_version": "11.1.x-gpu-ml-scala2.12",
              "num_workers": 0,
              "spark_conf": {
                "spark.databricks.cluster.profile": "singleNode",
                "spark.master": "local[*, 4]",
                "spark.databricks.delta.preview.enabled": "true"
              },
              "node_type_id": "${{ env.NODE_TYPE_ID }}",
              "custom_tags": {
                "ResourceClass": "SingleNode"
              }
            }
          access-control-list-json: >
            [
              {
                "group_name": "users",
                "permission_level": "CAN_VIEW"
              }
            ]
          run-name: testing github triggering of databricks notebook

The workflow runs through fine with the libraries-json configuration removed (and the necessary library installed within the triggered notebook.)

Is this a bug? Or am I misunderstanding how libraries-json can be used?

@JamesBorg
Copy link
Author

Thanks to @vladimirk-db who provided me the solution.

Needed to modify to:

{ "pypi": { "package": "accelerate" } }

Perhaps the README should be updated to reflect this?

@motya770
Copy link

motya770 commented Sep 5, 2023

Also #46

@benoitmiserez
Copy link

Updated in #52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants