-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate that Error Statuses from AWS are not obfuscated #324
Comments
Hi @skuzzle, hmm this is a tough one. I would think that we would want to forward whatever status codes AWS returns to us and not get too opinionated with our error handling here but I am not familiar with that specific error. Do you have any more information on it? It does seem like many of these "GENERIC_INTERAL_ERRORS" are often syntax/client related, which makes me a bit hesitant to add something: https://repost.aws/knowledge-center/athena-generic-internal-error I think for us to implement something like this we'd need more information about the error so we feel confident that it should in fact be treated like a 5xx in all instances. It sounds like in your case it probably makes sense, but I'm not confident that is true in all cases since I don't understand what it means. We can ask our contacts at AWS to see if they are more familiar, it's also possible they do respond with a 5xx and there's a bug in our code where we do not forward that along properly. In the mean time it could be helpful if you also want to reach out to AWS, and do let us know if you learn anything more about this error. Thanks for reporting! |
This is what I suspected is happening. I'm not expecting you to fix Athena error handling on your side. So if you are already forwarding 5xx as 5xx and 4xx as 4xx then you probably should not change a thing and it's up to Athena folks to fix this. We spoke to AWS, resp. with our cloud contractor about some of the odd failures and the gist was that those kind of errors can happen in a distributed system and they should be handled with a retry on client side. Now this obviously makes only sense if those errors do come back as 5xx |
Ah I see what you mean @skuzzle its going to be hard for us to cause the specific error you're talking about to test since we don't have repro steps, but we certainly can stub out different error status codes from AWS and see what gets returned. I just tried hard coding a fake error with a status of 500, and noticed that while we returned that in the response object, the actual status of the response was a 400. I'm guessing we probably have a bug here. I updated the title to have someone double check/update any places they can find where we might not be forwarding the error status code. Thanks for talking it through and making the issue! I'm going to put this into our backlog for now since we don't have a specific timeline just yet on fixing it and don't want to overpromise a delivery date till we can figure it out, but will be sure to bring it up to the team. Also if you have any interest in contributing we'd be happy to review any prs :) |
Thx for picking this up. For the sake of completeness, here are two further errors we encountered that came back with a 4xx status from Grafana but that clearly look like internal Athena errors:
|
What happened:
We're often seeing random Athena errors like this one:
Those errors are impossible to predict and apparently also not avoidable. However, when such an error happens, it often comes back with a 4xx error code from the
/api/ds/query
endpoint. We are using this endpoint for some automated tests and kind of rely on the response code to decide whether to retry the test. Retrying a something that really is a client error (e.g. query with syntax error) doesn't really make sense. So it would be nice to be able to distinguish between real client errors and internal errors.What you expected to happen:
I'd expect internal Athena errors to be returned with a 5xx response code for the
/query
endpoint. 4xx response code makes sense for real client errors like sending a query with a syntax error.How to reproduce it (as minimally and precisely as possible):
Not really reproducible in a reliable way
Environment:
The text was updated successfully, but these errors were encountered: