Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ml-pipeline-ui failing on metadata api #11247

Open
richardburakowski opened this issue Sep 24, 2024 · 4 comments · May be fixed by #11321 or #11403
Open

ml-pipeline-ui failing on metadata api #11247

richardburakowski opened this issue Sep 24, 2024 · 4 comments · May be fixed by #11321 or #11403

Comments

@richardburakowski
Copy link

richardburakowski commented Sep 24, 2024

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
  • Standalone kubernetes cluster v1.29.4
  • kubeflow/manifests-v1.9.1-rc.1
  • kubeflow/manifests-v1.9.0

Not seeing this in older versions of kubeflow/manifests, e.g. 1.6.1

  • KFP version:
    2.3.0

Steps to reproduce

Navigate to kubeflow dashboard pipelines

An error occurred

upstream connect error or disconnect/reset before headers. reset reason: remote connection failure, transport failure reason: delayed connect error: 111

Expected result

No dashboard error

Materials and Reference

image: gcr.io/ml-pipeline/frontend:2.3.0

kubectl -n kubeflow logs -l app=ml-pipeline-ui

GET /pipeline/
GET /pipeline/apis/v1beta1/pipelines?page_size=5&sort_by=created_at%20desc
Proxied request:  /apis/v1beta1/pipelines?page_size=5&sort_by=created_at%20desc
GET /pipeline/apis/v1beta1/runs?page_size=5&sort_by=created_at%20desc&resource_reference_key.type=NAMESPACE&resource_reference_key.id=undefined
Proxied request:  /apis/v1beta1/runs?page_size=5&sort_by=created_at%20desc&resource_reference_key.type=NAMESPACE&resource_reference_key.id=undefined
GET /pipeline/static/js/main.b980985e.js
GET /pipeline/static/css/main.e10b3034.css
GET /pipeline/apis/v1beta1/healthz
GET /pipeline/apis/v2beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter=
Proxied request:  /apis/v2beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter=
GET /pipeline/system/project-id
GET /pipeline/system/cluster-name

/server/node_modules/node-fetch/lib/index.js:1491
			reject(new FetchError(`request to ${request.url} failed, reason: ${err.message}`, 'system', err));
			       ^
FetchError: request to http://metadata/computeMetadata/v1/project/project-id failed, reason: getaddrinfo ENOTFOUND metadata
    at ClientRequest.<anonymous> (/server/node_modules/node-fetch/lib/index.js:1491:11)
    at ClientRequest.emit (node:events:517:28)
    at Socket.socketErrorListener (node:_http_client:501:9)
    at Socket.emit (node:events:517:28)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  type: 'system',
  errno: 'ENOTFOUND',
  code: 'ENOTFOUND'
}

Node.js v18.18.2

As a workaround, adding metadata to dns as an alias for any http server seems to work. It doesn't need to return cloud metadata, just connect.

Impacted by this bug? Give it a 👍.

@luko0610
Copy link

@richardburakowski
After some hours wasted and having a look at the code here:
https://github.com/kubeflow/pipelines/blob/master/frontend/server/configs.ts

I found the solution. The ui deployment ml-pipeline-ui needs to have the environment variable DISABLE_GKE_METADATA set to 'true'. This disabled the gke metadata fetch, not available outside of GKE clusters.

@richardburakowski
Copy link
Author

Hi @luko0610,
Thanks for the investigation. Appreciate you spending the time on this.
This bug is with the standalone install of kubeflow using the kubeflow manifests, so it's expected to work without gke metadata out of the box. Probably not obvious but the issue is not present in older versions of the install (description updated).

There's no trace of DISABLE_GKE_METADATA in any version of the kustomize output making this look like new behaviour of the container image in the absence of gke metadata rather than a bug in the install manifests.

@joelcomp1
Copy link

@richardburakowski After some hours wasted and having a look at the code here: https://github.com/kubeflow/pipelines/blob/master/frontend/server/configs.ts

I found the solution. The ui deployment ml-pipeline-ui needs to have the environment variable DISABLE_GKE_METADATA set to 'true'. This disabled the gke metadata fetch, not available outside of GKE clusters.

thank you this solved my issues!

gregsheremeta added a commit to gregsheremeta/data-science-pipelines that referenced this issue Oct 19, 2024
Since GKE is only one of many platforms, change the default value
of `DISABLE_GKE_METADATA` to `true`. Edit the gcp manifest to override
that to `false` on GCP.

Fixes: kubeflow#11247

Signed-off-by: Greg Sheremeta <[email protected]>
gregsheremeta added a commit to gregsheremeta/data-science-pipelines that referenced this issue Oct 21, 2024
Since GKE is only one of many platforms, change the default value
of `DISABLE_GKE_METADATA` to `true`. Edit the gcp manifest to override
that to `false` on GCP.

Fixes: kubeflow#11247

Signed-off-by: Greg Sheremeta <[email protected]>
@gregsheremeta gregsheremeta linked a pull request Oct 21, 2024 that will close this issue
2 tasks
@etheleon
Copy link

thanks that env variable worked

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment