Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sky dashboard errors #4825

Open
v-lopez opened this issue Feb 26, 2025 · 0 comments · May be fixed by #4895
Open

Sky dashboard errors #4825

v-lopez opened this issue Feb 26, 2025 · 0 comments · May be fixed by #4895
Assignees

Comments

@v-lopez
Copy link

v-lopez commented Feb 26, 2025

I am running the job controller on my kubernetes cluster, and I also have the api server there.

When I run the sky jobs dashboard the browser shows the following error:

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

The API server works great, I can check the job queue with sky jobs queue but can't see the dashboard.

These are the logs for the ingress nginx controller

10.42.1.0 - skypilot [26/Feb/2025:10:06:27 +0000] "GET /api/health HTTP/1.1" 200 120 "-" "python-requests/2.32.3" 208 0.004 [skypilot-skypilot-api-service-80] [] 10.42.1.70:46580 120 0.004 200 57eb00dc33e21e36a4e810371a4b7c87
2025/02/26 10:06:31 [warn] 435#435: *1115425 upstream sent duplicate header line: "server: Werkzeug/3.1.3 Python/3.10.10", previous value: "server: uvicorn", ignored while reading response header from upstream, client: 10.42.1.0, server: _, request: "GET /jobs/dashboard?user_hash=f3ad7df3 HTTP/1.1", upstream: "http://10.42.1.70:46580/jobs/dashboard?user_hash=f3ad7df3", host: "api_server:30050"
2025/02/26 10:06:31 [warn] 435#435: *1115425 upstream sent duplicate header line: "date: Wed, 26 Feb 2025 10:06:31 GMT", previous value: "date: Wed, 26 Feb 2025 10:06:26 GMT", ignored while reading response header from upstream, client: 10.42.1.0, server: _, request: "GET /jobs/dashboard?user_hash=f3ad7df3 HTTP/1.1", upstream: "http://10.42.1.70:46580/jobs/dashboard?user_hash=f3ad7df3", host: "api_server:30050"
10.42.1.0 - skypilot [26/Feb/2025:10:06:31 +0000] "GET /jobs/dashboard?user_hash=f3ad7df3 HTTP/1.1" 500 265 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:135.0) Gecko/20100101 Firefox/135.0" 420 4.105 [skypilot-skypilot-api-service-80] [] 10.42.1.70:46580 265 4.105 500 9564c13cde0ecf83ca2287205875484f
2025/02/26 10:06:33 [warn] 435#435: *1115425 upstream sent duplicate header line: "server: Werkzeug/3.1.3 Python/3.10.10", previous value: "server: uvicorn", ignored while reading response header from upstream, client: 10.42.1.0, server: _, request: "GET /jobs/dashboard?user_hash=f3ad7df3 HTTP/1.1", upstream: "http://10.42.1.70:46580/jobs/dashboard?user_hash=f3ad7df3", host: "api_server:30050"
2025/02/26 10:06:33 [warn] 435#435: *1115425 upstream sent duplicate header line: "date: Wed, 26 Feb 2025 10:06:33 GMT", previous value: "date: Wed, 26 Feb 2025 10:06:32 GMT", ignored while reading response header from upstream, client: 10.42.1.0, server: _, request: "GET /jobs/dashboard?user_hash=f3ad7df3 HTTP/1.1", upstream: "http://10.42.1.70:46580/jobs/dashboard?user_hash=f3ad7df3", host: "api_server:30050"
10.42.1.0 - skypilot [26/Feb/2025:10:06:33 +0000] "GET /jobs/dashboard?user_hash=f3ad7df3 HTTP/1.1" 500 265 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:135.0) Gecko/20100101 Firefox/135.0" 420 0.901 [skypilot-skypilot-api-service-80] [] 10.42.1.70:46580 265 0.901 500 890a9cca0e4e115b0fac5aea768b42fd

The 10.42.70.1 address belongs to the api server pod:

NAMESPACE              NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE      NOMINATED NODE   READINESS GATES
skypilot               skypilot-api-server-cfd79c866-plb7q                               1/1     Running            0                  107m    10.42.1.70      turing    <none>           <none>

The api logs

Starting dashboard for user hash: f3ad7df3
Starting dashboard
Forwarding port: kubectl --pod-running-timeout 1s -n skypilot port-forward pod/sky-jobs-controller-ad245f97-ad245f97-head 5001:5000 > ~/sky_logs/api_server/dashboard-f3cb7df3.log 2>&1
Dashboard is now available at: http://127.0.0.1:5001
I 02-26 10:06:31 httptools_impl.py:476] 10.42.0.79:40990 - "GET /jobs/dashboard?user_hash=f3cb7df3 HTTP/1.1" 500
Starting dashboard for user hash: f3ad7df3
I 02-26 10:06:33 httptools_impl.py:476] 10.42.0.79:40996 - "GET /jobs/dashboard?user_hash=f3cb7df3 HTTP/1.1" 500
I 02-26 10:06:49 httptools_impl.py:476] 10.42.1.1:37264 - "GET /api/health HTTP/1.1" 200
I 02-26 10:06:49 httptools_impl.py:476] 10.42.1.1:37266 - "GET /api/health HTTP/1.1" 200
I 02-26 10:07:19 httptools_impl.py:476] 10.42.1.1:56406 - "GET /api/health HTTP/1.1" 200

And the ~/sky_logs/api_server/dashboard-f3cb7df3.log

Forwarding from 127.0.0.1:5001 -> 5000
Forwarding from [::1]:5001 -> 5000
Handling connection for 5001
Handling connection for 5001
Handling connection for 5001
Handling connection for 5001
@cg505 cg505 linked a pull request Mar 5, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants