Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Error Handling #11

Open
cybercoder-naj opened this issue Sep 21, 2024 · 5 comments
Open

Improve Error Handling #11

cybercoder-naj opened this issue Sep 21, 2024 · 5 comments

Comments

@cybercoder-naj
Copy link
Member

The server logs don't have much to decipher about the request and why it failed. This includes

  • Improving the logging format displayed to the developers.
  • Improving the quality of error or warning messages throughout the server app.
@Dropheart
Copy link
Contributor

Relevant discussion from the commitee discord of '24/25
image

@Dropheart Dropheart changed the title Improve Server Logging Improve Error Handling Oct 9, 2024
@cybercoder-naj
Copy link
Member Author

Slightly disagree still. We have to determine the severity level of each of the errors in the context of when they are called. For example, consider the error of 404. They can be returned in these ways:

  1. When the user is hitting a route that doesn't exist. That happens when the front end is bugged, which is fixable by having proper frontend testing, agreed? Yes! But what if it is not because the front is bugged, but because we have a malicious user trying to pen test our systems by attacking some routes by common convention? What if the person had an attempted DDoS attack (eventually blocked by Cloudflare e.g.)? This is an issue especially when the project is proprietary like ichack.
  2. However, if the user is hitting the right route but has a 404 because the handler didn't find the desired resource, that is not need to be logged since it is the desired behaviour. E.g. hitting the family route when the user was never assigned a family.
  3. The president (Hashir) had forwarded a report where people using Safari weren't able to get some resource they were entitled to because of a 400 error. I agree it is the frontend fault (the browser) but in this case, both of us (dropheart and me) don't have safari to run these tests. So we need to setup rigorous testing in multiple environment (through CI runners or something) in the frontend, or log errors we probably shouldn't see (based on the discretion of the developers).

@Dropheart
Copy link
Contributor

Dropheart commented Oct 9, 2024

  1. The front end referring you to a route that doesn't exist is prevented by trying the site locally before pushing to prod. If it's a curious and/or malicious user, this app is open source, they can simply see the routes.
  2. Agreed.
  3. Looking in the code, all the instances of 400 are invalid requests, such as
  • Invalid callback request (the Zod handler for this should be fixed to check whether it's an Entra error or a query params error, but it's logged anyways. We should also return a 400/500 based on this.)
  • User has no shortcode. This should probably be logged, and the frontend should pass the error the backend gave.
  • User has no entry year. This should be logged, and the frontend should pass the error the backend gave.
  • Invalid family POST body. The frontend should prevent this, anything else is people manually hitting it so they can deal with it themselves, we don't need to log that
  • Already completed survey. Frontend does prevent this, anything else is people hitting it manually, we don't need to log.
  • Student tries to POST survey during parent time. They're doing it manually, they can deal with it themselves and read our return text themselves, dont' need to log.
  • Invalid proposal POST/DELETE body. The frontend prevents this, anything else is people hitting it manually, we don't need to log.
  • Proposing without having completed the survey. Frontend prevents, manual, dont need to log.
  • Already married. Frontend prevents, manual, don't need to log.
  • Trying to marry self, frontend should prevent, otherwise manual, don't need to log.
  • Invalid proposee. The frontend should display this issue correctly, we don't need to log every single typo.
  • Max proposals, frontend should prevent, manual, don't need to log.
  • Already proposed to user, frontend should prevent, manual, don't need to log.
  • Trying to delete proposal haven't made, frontend should prevent, manual, don't need to log.
  • Accepting proposal invalid POST body, frontend prevents, manual, dont need to log.
  • Accepting proposal that doesn't exist, manual, don't need to log
  • Trying to get a family without having a family, frontend should display a good error message, don't need to log. I changed the frontend to dispaly a good error message on this as it was very common.

Of all of those, the majority are things the frontend should block. In cases where the frontend should not block it, I disagree that we need to log it internally, but rather the frontend should display a better error message - currently it just displays something like "Error in METHOD to ROUTE: 400 bad request". Instead, the frontend should use the response text as the backend always provides a response explaining the error for 400s.
TL;DR is 400s are user error and if we logged those we would have a lot of waste.

In the case of Hashir's report, realistically I think that user just typo'd the persons shortcode. I don't think this is to do with Safari nor something we need(ed) to test. Had the frontend propogated the backend error correctly, we wouldn't have needed to receive a report. All this being said, I did find a few places where the backend should be logging that it's not logging, but overall logging 400s would just add clutter, it's 500s that should be (and are) logged.

Anyways, this is next year webmasters problem now hahahah unless me or you get bored.

@cybercoder-naj
Copy link
Member Author

cybercoder-naj commented Oct 10, 2024

  1. Yaa i know mad3 is open-source, but ichack will always be proprietary. If that project is receiving 404 errors on incorrect routes, it is because the frontend is bugged or someone is trying to bruteforce/pentest the server.

GPT 4o prompt: When building the server, it is logical to explicitly log when the request resulted in a 500 server error. Is it useful to also explicitly log why a request resulted in a 4xx error? Argue for and against this

When deciding whether to explicitly log 4xx errors (client-side errors) in server logs, there are both advantages and disadvantages. Here's a breakdown of the arguments for and against:

For Logging 4xx Errors

  1. Diagnosing Misuse or Security Risks:

    • Invalid Inputs: Logging 4xx errors can reveal patterns of incorrect usage, such as malformed requests, missing authentication, or bad input data.
    • Security Monitoring: Repeated or unusual 4xx errors (e.g., 401 Unauthorized or 403 Forbidden) might indicate probing or malicious attempts to access unauthorized resources. Monitoring this can help detect and mitigate security risks early.
  2. Improving User Experience:

    • Error Insight: By tracking client-side errors, developers can identify frequent issues users are encountering, which can inform improvements to APIs, documentation, or frontend validation logic.
    • Faster Debugging: Explicitly logging 4xx reasons can streamline debugging when issues arise, especially if the logs show exactly why requests failed (e.g., “missing required field” or “invalid token”).
  3. Compliance or Auditing Requirements:

    • In some regulated industries, logging all errors, including 4xx, may be necessary for auditing, compliance, or forensic purposes.
  4. API Health Monitoring:

    • Usage Patterns: Tracking 4xx errors over time gives insights into how users are interacting with the API, identifying common mistakes or potential documentation gaps.

Against Logging 4xx Errors

  1. Log Volume:

    • Noise in Logs: Client errors may occur frequently, especially in public-facing APIs, leading to large amounts of noise in the logs. This can make it harder to identify meaningful issues, such as server-side errors (500-level) or critical security events.
    • Increased Storage Costs: Constant logging of 4xx errors in high-traffic systems could increase storage costs and complicate log management if not properly filtered.
  2. Not Always Actionable:

    • Expected Behavior: Many 4xx errors are the result of user mistakes or intended client behavior (e.g., submitting an expired token). Since these errors are client-side, they may not be immediately actionable from the server side, making them less critical to log compared to 5xx errors.
    • Frontend Responsibilities: Often, frontend applications are responsible for handling client errors and guiding the user to correct the problem. The server does not always need to keep a record of these issues.
  3. Privacy Concerns:

    • Sensitive Data in Logs: Logging details of 4xx errors could inadvertently expose sensitive data (e.g., user tokens, personally identifiable information). This requires careful handling to avoid compromising privacy and security.

Conclusion

Whether or not to log 4xx errors depends on the specific use case. In environments where security, API health, and client behavior monitoring are important, logging 4xx errors can provide valuable insights. However, in high-traffic systems or where client errors are expected, they might be best logged selectively or handled on the client side to reduce noise and protect sensitive information.

A balanced approach would be to log 4xx errors conditionally based on severity (e.g., log only when 401 or 403 errors occur) or to aggregate certain types of 4xx errors without storing excessive details.


TL;DR ChatGPT sides with me based on the comprise after hearing your opinion. We should log the 4xx errors too depending on the severity of the problem.

Although in no way I am saying don't improve the error reporting on the frontend. Infact revealing the route like /api/myFamily 403 Forbidden is not a good message regardless of an open source project. But sometimes 4xx can cause by factors that is not easily testable by the developer beforehand (you never know).


EDIT: It now a job to understand how to know what parts of the codebase result in a low/high severity 4xx problems. If the user has an expired token, it is normal for 403 to return, but if there are multiple 403 requests with the user not sending in any token, or a token not in db. it's something to think about. But that has increased code complexity which may not be needed.

I am not solely thinking about mad3 but also web development in general (also with my mind polluted with ichack)

@Dropheart
Copy link
Contributor

Re security: In a more general case, I do agree that people trying to access restricted routes and getting a 404 should be logged, though I know I have done that before and it's pure curiosity, not malice hahah. I think the more standard thing, however, is to use tools such as fail2ban to handle that.

ChatGPTs point 2. of improving UX etc isn't a bad idea - having some logging for statistics on each request etc - though a little out of our scope. I disagree with 3/4 for us.

Re: your edit, we can specifically log expired tokens as we clear your token when it's expired to tell the difference.

Understandable last point, though I do think it's more of a by-project thing (even, by specific route and specific errors) rather than a web dev in general thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants