Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRITICAL: Access to a specific report is lost and couldn't see the chats #41542

Closed
2 of 6 tasks
m-natarajan opened this issue May 2, 2024 · 35 comments
Closed
2 of 6 tasks
Assignees
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 Engineering Needs Reproduction Reproducible steps needed

Comments

@m-natarajan
Copy link

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!


Version Number:
Reproducible in staging?: needs reproduction
Reproducible in production?: needs reproduction
If this was caught during regression testing, add the test name, ID and link from TestRail:
Email or phone of affected tester (no customers): https://staging.new.expensify.com/r/8823458142807525 / https://staging.new.expensify.com/r/71168705
Logs: https://stackoverflow.com/c/expensify/questions/4856
Expensify/Expensify Issue URL:
Issue reported by: @tgolen
Slack conversation: https://expensify.slack.com/archives/C049HHMV9SM/p1714659279184109

Action Performed:

  1. Open chat report with @Beamanator

Expected Result:

Able to access the chat report and chat loads

Actual Result:

  • Don't see the messages sent
  • Chat with @Beamanator is not shown in LHN
  • When access his chat fro search getting not found page

Workaround:

unknown

Platforms:

Which of our officially supported platforms is this issue occurring on?

  • Android: Native
  • Android: mWeb Chrome
  • iOS: Native
  • iOS: mWeb Safari
  • MacOS: Chrome / Safari
  • MacOS: Desktop

Screenshots/Videos

Add any screenshot/video evidence
image (29)

View all open jobs on GitHub

@m-natarajan m-natarajan added Daily KSv2 Needs Reproduction Reproducible steps needed Bug Something is broken. Auto assigns a BugZero manager. labels May 2, 2024
Copy link

melvin-bot bot commented May 2, 2024

Triggered auto assignment to @trjExpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.

@MelvinBot
Copy link

This has been labelled "Needs Reproduction". Follow the steps here: https://stackoverflowteams.com/c/expensify/questions/16989

@iwiznia iwiznia added the DeployBlocker Indicates it should block deploying the API label May 2, 2024
Copy link

melvin-bot bot commented May 2, 2024

👋 Friendly reminder that deploy blockers are time-sensitive ⏱ issues! Check out the open Staging deploy checklist to see the list of PRs included in this release, then work quickly on the following:

  1. If you find which PR caused the issue/bug, you can reassign it to the person responsible for it.
    • If the author is OOO or won’t get online before the daily deploy is due, you are responsible for finding the best fix/path forward. Don’t hesitate to ask for help!
  2. Try to reproduce the issue, if the bug is on production, remove the DeployBlocker label but stay assigned to fix it (or find out which PR broke it to get help from the author).
    • You can adjust the urgency of the issue to better represent the gravity of the bug.
    • If the issue is super low priority, feel free to un-assign yourself.
    • Be careful with PHP warnings, sometimes it is more complex than just adding a null coalescing operator as they might be uncovering some bigger bug.
    • If it was a one-off issue that requires no action (for example, Bedrock was down or it is a duplicated issue), you can close it.

Remember rule #2: Never un-assign yourself from a real DeployBlocker unless you are 100% sure someone else is assigned and will take care of it.

@iwiznia iwiznia removed DeployBlocker Indicates it should block deploying the API Hourly KSv2 Engineering labels May 2, 2024
@iwiznia
Copy link
Contributor

iwiznia commented May 2, 2024

Sorry about that noise, was doing a test for https://github.com/Expensify/PHP-Libs/pull/954 and did not realize all that was going to happen 😄

@yuwenmemon yuwenmemon removed their assignment May 2, 2024
@trjExpensify
Copy link
Contributor

@tgolen the report not found sounds suspiciously like what I experienced here: #41254 (comment)

{"code":666,"jsonCode":403,"type":"Expensify\\Libs\\Error\\ExpError","UUID":"da7984df-628f-41f6-a2b8-ecd4499f7975","message":"Report not found","title":"","data":{"onyxData":[{"onyxMethod":"merge","key":"report_0","value":{"errorFields":{"notFound":{"1714580833933154":"Report not found"}}}}]},"htmlMessage":"","requestID":"87d123434d9c4164-LHR"}

Here are the logs Tim shared in thread as well. @m-natarajan let's make sure you put those in the issues you create, because they're helpful.

@Beamanator's theory is:

"Wild guess because @aldo-expensify has seen us calling OpenReport on report action ids recently I believe"

What are the next steps here? Internal or External? @muttmuure do you want this in NewDot Performance?

@muttmuure
Copy link
Contributor

Does anyone have their onyx state preserved such that they're still experiencing this? (Don't refresh or leave the page where you're seeing the bug.) If so let's start a thread in #newdot-performance and recruit an engineer to debug it live

@trjExpensify
Copy link
Contributor

Not I, I put my logs in the linked issue.

@muttmuure
Copy link
Contributor

OK, I have it:

image

@iwiznia iwiznia added the DeployBlocker Indicates it should block deploying the API label May 3, 2024
Copy link

melvin-bot bot commented May 3, 2024

👋 Friendly reminder that deploy blockers are time-sensitive ⏱ issues! Check out the open Staging deploy checklist to see the list of PRs included in this release, then work quickly on the following:

  1. If you find which PR caused the issue/bug, you can reassign it to the person responsible for it.
    • If the author is OOO or won’t get online before the daily deploy is due, you are responsible for finding the best fix/path forward. Don’t hesitate to ask for help!
  2. Try to reproduce the issue, if the bug is on production, remove the DeployBlocker label but stay assigned to fix it (or find out which PR broke it to get help from the author).
    • You can adjust the urgency of the issue to better represent the gravity of the bug.
    • If the issue is super low priority, feel free to un-assign yourself.
    • Be careful with PHP warnings, sometimes it is more complex than just adding a null coalescing operator as they might be uncovering some bigger bug.
    • If it was a one-off issue that requires no action (for example, Bedrock was down or it is a duplicated issue), you can close it.

Remember rule #2: Never un-assign yourself from a real DeployBlocker unless you are 100% sure someone else is assigned and will take care of it.

@iwiznia iwiznia removed DeployBlocker Indicates it should block deploying the API Hourly KSv2 labels May 3, 2024
@iwiznia
Copy link
Contributor

iwiznia commented May 3, 2024

Sorry again, used it as a test once more

@trjExpensify
Copy link
Contributor

Unsubscribe

@muttmuure muttmuure changed the title Accesss to a specific report is lost and couldn't see the chats HIGH: [UX Reliability] Access to a specific report is lost and couldn't see the chats May 7, 2024
@muttmuure
Copy link
Contributor

Labeling this one UX Reliability

@trjExpensify
Copy link
Contributor

So it sounds like @mallenexpensify could be right and this is an OpenReport error

Ah, which we're working on somewhere else at this point?

@mallenexpensify
Copy link
Contributor

I feel like a lot of these bugs might be related to the write performance issues we're experiencing (or something that's being negatively affected because of the issues with write performance). For some of my bugs, I've punting them to retest next week, after we fix the fire and are on the other side of the merge freeze.

@melvin-bot melvin-bot bot added the Overdue label May 9, 2024
@trjExpensify
Copy link
Contributor

Asking if we're moving this anywhere here, I'm a bit unclear on the next steps: https://expensify.slack.com/archives/C05LX9D6E07/p1715364552147619?thread_ts=1714750131.190839&cid=C05LX9D6E07

@melvin-bot melvin-bot bot removed the Overdue label May 10, 2024
Copy link

melvin-bot bot commented May 10, 2024

@trjExpensify Whoops! This issue is 2 days overdue. Let's get this updated quick!

@quinthar quinthar changed the title HIGH: [UX Reliability] Access to a specific report is lost and couldn't see the chats CRITICAL: Access to a specific report is lost and couldn't see the chats May 12, 2024
@quinthar quinthar moved this to CRITICAL in [#whatsnext] #quality May 12, 2024
@melvin-bot melvin-bot bot added the Overdue label May 13, 2024
@muttmuure
Copy link
Contributor

Bumped in channel

@melvin-bot melvin-bot bot removed the Overdue label May 13, 2024
@muttmuure
Copy link
Contributor

I think after considering the RCA that Dan provided when I looked into this, I think that the solution here is a mechanism that retrieves the missing parts of the report object. It does seem like what happens is:

  • You try to load a report
  • Potentially there are site issues
  • The report gets stuck in this partially loaded state (with contact details only)
  • We don't "retry" getting the rest of the details

@muttmuure
Copy link
Contributor

So I think that this issue will fix this #41112 (comment)

@tgolen
Copy link
Contributor

tgolen commented May 14, 2024

Just to make sure everyone has the same context... There are lots of suggestions to have some code that monitors for partial report data and then pulls the full report.

We had this exact logic previously, and it was removed in this PR. There was a lot of history behind that decision and it was a real cause of performance issues.

This kind of "magic" code that monitored for partial reports leads to a lot of code that just assumes that a full report object will always be there and code that doesn't know where the data comes from or can't recover when all of a sudden that data isn't there like it expects.

@muttmuure
Copy link
Contributor

thanks for that context, @tgolen!

Copy link

melvin-bot bot commented May 16, 2024

@muttmuure this issue was created 2 weeks ago. Are we close to a solution? Let's make sure we're treating this as a top priority. Don't hesitate to create a thread in #expensify-open-source to align faster in real time. Thanks!

@melvin-bot melvin-bot bot added the Overdue label May 16, 2024
Copy link

melvin-bot bot commented May 17, 2024

@muttmuure Whoops! This issue is 2 days overdue. Let's get this updated quick!

@muttmuure
Copy link
Contributor

Updates in channel

@melvin-bot melvin-bot bot removed the Overdue label May 20, 2024
@danieldoglas
Copy link
Contributor

@arosiclair I can see you worked on https://github.com/Expensify/Web-Expensify/pull/41296, maybe you can look into this?

@muttmuure
Copy link
Contributor

Since none of the original reporters are experiencing this anymore, I am going to close it

@github-project-automation github-project-automation bot moved this from CRITICAL to Done in [#whatsnext] #quality May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 Engineering Needs Reproduction Reproducible steps needed
Projects
Development

No branches or pull requests

10 participants