Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefect job stuck in Pending state #10149

Open
4 tasks done
aivansky-contractor opened this issue Jul 4, 2023 · 5 comments
Open
4 tasks done

Prefect job stuck in Pending state #10149

aivansky-contractor opened this issue Jul 4, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@aivansky-contractor
Copy link

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

This is an intermittent issue.

We are suing prefect version 2.10.12 and we are running prefect agent based on docker image 2.10.12-python3.11 deployed into Kubernetes.

We were able to find the following log items associated with jobs stuck in pending.

Reproduction

Submit a flow multiple times until the issue occurs, this is an intermittent issue

Error

17:41:45.227 | DEBUG   | prefect.client - Encountered retryable exception during request. Another attempt will be made in 2.191032998879657s. This is attempt 1/6.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/h2/connection.py", line 224, in process_input
    func, target_state = self._transitions[(self.state, input_)]
                         ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: (<ConnectionState.CLOSED: 3>, <ConnectionInputs.RECV_PING: 14>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http2.py", line 125, in handle_async_request
    status, headers = await self._receive_response(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http2.py", line 242, in _receive_response
    event = await self._receive_stream_event(request, stream_id)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http2.py", line 273, in _receive_stream_event
    await self._receive_events(request, stream_id)
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http2.py", line 294, in _receive_events
    events = await self._read_incoming_data(request)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http2.py", line 380, in _read_incoming_data
    events: typing.List[h2.events.Event] = self._h2_state.receive_data(data)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/h2/connection.py", line 1463, in receive_data
    events.extend(self._receive_frame(frame))
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/h2/connection.py", line 1487, in _receive_frame
    frames, events = self._frame_dispatch_table[frame.__class__](frame)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/h2/connection.py", line 1760, in _receive_ping_frame
    events = self.state_machine.process_input(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/h2/connection.py", line 228, in process_input
    raise ProtocolError(
h2.exceptions.ProtocolError: Invalid input ConnectionInputs.RECV_PING in state ConnectionState.CLOSED
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 60, in map_httpcore_exceptions
    yield
  File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 353, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 261, in handle_async_request
    raise exc
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 245, in handle_async_request
    response = await connection.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/connection.py", line 96, in handle_async_request
    return await self._connection.handle_async_request(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpcore/_async/http2.py", line 155, in handle_async_request
    raise LocalProtocolError(exc)  # pragma: nocover
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
httpcore.LocalProtocolError: Invalid input ConnectionInputs.RECV_PING in state ConnectionState.CLOSED
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/client/base.py", line 193, in _send_with_retry
    response = await request()
               ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1617, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpx/_client.py", line 1719, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 352, in handle_async_request
    with map_httpcore_exceptions():
  File "/usr/local/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.11/site-packages/httpx/_transports/default.py", line 77, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.LocalProtocolError: Invalid input ConnectionInputs.RECV_PING in state ConnectionState.CLOSED

Versions

2.10.12

Additional context

No response

@zzstoatzz
Copy link
Collaborator

hi @aivansky-contractor - can you provide more context on when this error arose?

are these logs from your agent process? it looks like it could be a transient network error, but hard to tell from the above trace

@EmilRex
Copy link
Contributor

EmilRex commented Oct 10, 2023

Looking into this a bit further, it seems to be an issue with the lower level library we use to handle HTTP/2. This issue reports the symptom and this issue describes the cause.

As a temporary measure you can set PREFECT_API_ENABLE_HTTP2=false on your agent or worker to disable the use of HTTP/2.

What happens here is that the server accepts the proposed state transition to Pending, but this exception is raised in the process leading the client to believe the request failed. When the client then retries the request, the server rejects the proposed state transition because the flow run is already in state Pending.

@kcd83
Copy link

kcd83 commented Dec 17, 2023

We believe we are seeing this issue, however we use Google Cloud Run - Push workers

Is there anyway to configure a serverless worker to only use HTTP 1 while this ticket is being worked on?

@kcd83
Copy link

kcd83 commented Dec 19, 2023

Quick update - this can be set client side

We use prefect.yaml and bundle our variables per work pool so will try the following 🤞

  - name: some-flow
    description: Legacy flows builder
    ...
    work_pool:
      name: cloud-run-push-pool
      job_variables:
        env:
          PREFECT_API_ENABLE_HTTP2: "false"

Hopefully string form "false" does the trick else I can try false

@zhen0
Copy link
Member

zhen0 commented Jan 9, 2024

Possibly connected to #11499

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants