-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Network failures with self-hosted servers #7512
Comments
Note
Server
Stack Trace
|
Possibly helpful context from an httpx discussion at encode/httpx#2056 |
Note:
#pip freeze | grep -E '(uvicorn|starlette)'
Client
#pip freeze | grep -E '(httpx|httpcore)'
Stack Trace:
|
Thanks for the report! We are currently suspicious that this is a bug in uvicorn. We'd like to find the root cause, but may explore client-side solutions in the interim. Note for other maintainers: This is a confirmed reproduction on 2.6.5. @ikeepo does this reproduce on the latest versions of uvicorn/starlette? We've also heard users report that this is solved on Prefect 2.6.4, do you see it there? |
We seen it on Prefect 2.6.3 Server
AgentWe are running a custom build version, with our two pull requests: Doubt they have anything to do with it though - specially since other people are seeing the same.
Docker containerCustom built from prefecthq/prefect:2.6.3-python3.10).
Stacktrace
Hope it helps. |
@madkinsz The latest versions of uvicorn/starlette doesn't work, it still reproduces and even much more frequently. |
Errors: Even without running any flows I get the above errors after some time and when I run a flow and the error appears, it crashes the flow. Further details in this thread https://prefect-community.slack.com/archives/CL09KU1K7/p1667880546594239 Connection Reset error (agent logs)
BROKEN PIPE ERROR(AGENT LOGS)
|
@MuFaheemkhan could you please edit your post to contain the full Docker image tag if you are using one of our official images or include the versions of the libraries as requested? A full traceback for the error would also be really helpful. Thanks! |
This comment was marked as off-topic.
This comment was marked as off-topic.
@eudyptula what's your prefect version? it's a different issue, I experienced the same errors as yours but got fixed after I upgraded to prefect 2.6.6 |
@eudyptula as noted by MuFaheemkhan, that is not a connection / network failure but rather a 500 error from the server. Please open a separate issue for that and include the relevant versions. There should be server logs with the error if you are hosting your own server. If you're using Cloud, we will find the server error.
This should not be the case. When the flow run crashes, the process should exit. If you can reproduce this, please open an issue so we can address it. |
Well, in that case we have several issues. We were on 2.6.3, just upgraded to 2.6.4 as @madkinsz said that some users didn't experience the issue on that version. At the same time I upgraded all the dependencies of Prefect on both the agents and the server - but not on the docker containers. Was going to see if 2.6.4 was running better before I upgraded all the way to the new version (and hopefully provide some helpful information to you at the same time). But thanks @MuFaheemkhan will definitely check out 2.6.6 and see if it helps! |
Getting this on 2.6.7. There is no information in the traceback on why the connection failed. To confirm, I did not have these issues on 2.6.4. |
I am getting the same errors. It happens when trying to call run_deployment from inside a task in the parent flow This issue makes Prefect 2 pretty unstable for production environments |
In my case I managed to stop getting these network related errors by not using the I make sure to create a new client instance instead of getting the flow context's one in contrast to what Somewhere there I believe lies the root cause of the issue: Using |
We are running |
I believe these issues are basically caused by a high volume of requests — we see these issues with the agent which polls frequently and now with |
In my case I stopped using run_deployment, now my flows doesn't crash, but connection reset and broken pipe error occur randomly in agent logs although this time it doesn't crash my flows. |
We are seeing our Also, the reason behind the http 500 was a database timeout, so had to increase the query timeout from the default 1 second. So, looking from our perspective, it could very well be issues with high volumen. |
@eudyptula I am using local postgress server, one thing I did was to increase the no of connection on postgress server/container. |
Howdy yall! Any chance anyone here would be interested in giving this branch a shot to see if it resolves the problems? #7593 @MuFaheemkhan , @eudyptula , @carlo-catalyst , @andreas-ntonas , |
Really looking forward to a fix for this, if there's a workaround we can use to stop the parent flows failing it would be much appreciated! |
@peytonrunyan: Haven't tried the branch, but is retry the right solution here? Sounds a bit like playing the lottery to me, just keep resending the same request, hoping that eventually it would go through. If the issue is caused by high volume of requests, as @madkinsz suggests - shouldn't a solution involve some way to throttle/queue requests on the agents? (just thoughts, not that familiar with how Prefect is designed) |
@eudyptula we believe this is an issue with the upstream HTTP libraries. We are implementing retries to solve the immediate issue for our users until we can work on fixing the bug upstream. |
We haven't had this error since we upgraded to 2.7.0 earlier today 👍 |
I can also confirm that flows that were failing regularly for me with this issue now seem to be working fine |
@madkinsz it looks like we got it addressed, so I'm going to go ahead and close this issue. Feel free to reopen it if you think there's something else that needs handling. |
This is tracking issue for various reports of network failures when self-hosting Prefect Orion.
Notably, these issues seem focused to usage of Prefect 2.6.6.
If adding a report to this issue, please include the following information:
Related to:
The text was updated successfully, but these errors were encountered: