Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LHN does not update properly when calling ReconnectApp #33771

Closed
iwiznia opened this issue Dec 29, 2023 · 39 comments
Closed

LHN does not update properly when calling ReconnectApp #33771

iwiznia opened this issue Dec 29, 2023 · 39 comments
Assignees
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 Engineering Internal Requires API changes or must be handled by Expensify staff

Comments

@iwiznia
Copy link
Contributor

iwiznia commented Dec 29, 2023

Coming from #33372

Problem

Left hand nav is outdated (in focus mode?) when calling ReconnectApp, which should never happen since after calling ReconnectApp everything should be up to date.

You can see logs from the App here and the call to ReconnectApp here and see that it failed to use the updates and instead called auth's OpenApp

My very wild guess is that for focus mode, it is never clearing the existing data in onyx, so a chat that was showing (since it was unread) would not get explicitly removed when OpenApp does not send it back.

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~01ea2b6c10b081507e
  • Upwork Job ID: 1742324658321010688
  • Last Price Increase: 2024-01-02
@iwiznia iwiznia added the Bug Something is broken. Auto assigns a BugZero manager. label Dec 29, 2023
Copy link

melvin-bot bot commented Dec 29, 2023

Triggered auto assignment to @MitchExpensify (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details.

@melvin-bot melvin-bot bot added the Daily KSv2 label Dec 29, 2023
Copy link

melvin-bot bot commented Dec 29, 2023

Bug0 Triage Checklist (Main S/O)

  • This "bug" occurs on a supported platform (ensure Platforms in OP are ✅)
  • This bug is not a duplicate report (check E/App issues and #expensify-bugs)
    • If it is, comment with a link to the original report, close the issue and add any novel details to the original issue instead
  • This bug is reproducible using the reproduction steps in the OP. S/O
    • If the reproduction steps are clear and you're unable to reproduce the bug, check with the reporter and QA first, then close the issue.
    • If the reproduction steps aren't clear and you determine the correct steps, please update the OP.
  • This issue is filled out as thoroughly and clearly as possible
    • Pay special attention to the title, results, platforms where the bug occurs, and if the bug happens on staging/production.
  • I have reviewed and subscribed to the linked Slack conversation to ensure Slack/Github stay in sync

@MitchExpensify
Copy link
Contributor

What steps should I take to reproduce this @iwiznia ?

@melvin-bot melvin-bot bot added the Overdue label Jan 2, 2024
@marcaaron
Copy link
Contributor

Probably it will be hard to reproduce since you need to miss x number of updates in order to trigger this part:

it failed to use the updates and instead called auth's OpenApp

It will be easier on dev where we can just comment out some part of the code (not sure where exactly) and see what happens.

@marcaaron marcaaron self-assigned this Jan 2, 2024
@melvin-bot melvin-bot bot removed the Overdue label Jan 2, 2024
@marcaaron marcaaron added Internal Requires API changes or must be handled by Expensify staff Overdue labels Jan 2, 2024
Copy link

melvin-bot bot commented Jan 2, 2024

Job added to Upwork: https://www.upwork.com/jobs/~01ea2b6c10b081507e

Copy link

melvin-bot bot commented Jan 2, 2024

Triggered auto assignment to Contributor Plus for review of internal employee PR - @situchan (Internal)

@melvin-bot melvin-bot bot removed the Overdue label Jan 2, 2024
@marcaaron
Copy link
Contributor

for focus mode, it is never clearing the existing data in onyx, so a chat that was showing (since it was unread) would not get explicitly removed when OpenApp does not send it back

Different way to say the same thing

while in focus mode, chats end up with stale data because an update from reliable updates queue is never received. If we fallback to OpenApp in this case we will send back only the reports for the focus mode query. Not the reports which the client has locally and something about them changed.

What is the best way to fix this?

As discussed, we think something like a Onyx.setCollection() could be the way forward. That way the fallback response of "OpenApp" can be used to overwrite existing values.

We will need to make sure that still works if you have pending actions. I think it would be fine as stuff like AddComment should be queued to send first then ReconnectApp later.

@aldo-expensify
Copy link
Contributor

aldo-expensify commented Jan 3, 2024

As discussed, we think something like a Onyx.setCollection() could be the way forward. That way the fallback response of "OpenApp" can be used to overwrite existing values.

This makes sense to me. An alternative I can think of is to send the reportIDs from the client, like we did with policyIds, but I like that less and I guess it scales worse.

@marcaaron
Copy link
Contributor

Ok, on a first pass - I cannot reproduce this problem and it does not seem like the current theory is the whole story.

If you look at the code for ReconnectApp we do not ever call it with the priority mode.

api.php

https://github.com/Expensify/Web-Expensify/blob/bff668e4e707766249cddb1117b0a4caf89e1621/api.php#L3280

AppInit.php (defaults to false)

https://github.com/Expensify/Web-Expensify/blob/7938771c948ff4a3c7e16a101cdbe0a9e00dbf70/lib/AppInit.php#L368

Which means we should be syncing with the full list when GetOnyxUpdates throws the 423 jsonCode.

And when I test locally I see everything sync up fine 🤔

@aldo-expensify
Copy link
Contributor

To reproduce and have GetOnyxUpdates throws the 423 jsonCode, do you disable reliable updates for the account?

@marcaaron
Copy link
Contributor

No I just did this:

diff --git a/lib/AppInit.php b/lib/AppInit.php
index 52612dbdca9..049c39521f6 100644
--- a/lib/AppInit.php
+++ b/lib/AppInit.php
@@ -375,6 +375,7 @@ class AppInit
             /** @psalm-suppress InvalidGlobal */
             global $trackMetric;
             try {
+                throw new ExpError('asdf', 'asdf', jsonCode: 423);
                 $response = Auth::getOnyxUpdates($authToken, $updateIDFrom, $updateIDTo);

                 // GetOnyxUpdates will return the format array{accountID => array{updates, previousUpdateID, lastUpdateID}}.

@marcaaron
Copy link
Contributor

marcaaron commented Jan 3, 2024

This leaves me to think this possibly has to do with the mostRecentReportActionLastModified changes which are still in effect when we call ReconnectApp.

Reminder: This pre-dates the "reliable updates" stuff where we will send only the new reports with based on the report actions that were last modified. That is very much based on the assumption that you would have everything that comes before this most recent update.

Also, note this logic does not change based on whether the priority mode filter is enabled or not... so again, I don't think this has to do with focus mode. However, when you switch to Most recent mode we do a hard sync of the reports list (and do not pass the mostRecentReportActionLastModified in this case) which would fix any problems with stale reports.

Maybe... somehow... a more recent report action sneaks in before the missing updates are fetched + all the updates fail to fetch? Something like:

  • User is offline
  • User comes back online
  • They somehow got an update - could it maybe be from a notification or something?
  • State ends up with a gap between the new update and the fresh
  • They call GetMissingOnyxUpdates it returns 423
  • They call OpenApp with their bad "most recent"
  • They only get the reports that are "newer" than this one

Not sure exactly how it can happen, but that's my best improved theory.

@marcaaron marcaaron removed their assignment Jan 3, 2024
@marcaaron
Copy link
Contributor

Few more logs (not sure if they help):

2023-12-29 14:42:32 707

Message created by Garret here: https://www.expensify.com/_devportal/tools/logSearch/#query=request_id:(%2283d2cf69cc627200-LHR%22)+AND+timestamp:[2023-12-29T13:42:32.708Z+TO+2023-12-29T15:42:32.708Z]&index=logs_expensify-026033

2023-12-29 14:42:50 826

@JmillsExpensify calls ReconnectApp here unsuccessfully:

Processing 'ReconnectApp' for 'expensify.com' from '189.172.241.217' ~~ browserGUID: '653308122cd44' initialReferer: '<REDACTED>' command: 'ReconnectApp' policyIDList: '01DC7E46150451C4,03841C763918B9E1,063B20AF6F45E248,0BFB979BCD2DF930,0CFE78B25EBE2A0A,0D00F2B7A62842CD,137DA25D273F2423,2D3A0166B09CA932,39181563B22B74F3,3A89327178065942,5C853229E59E0C08,67B16182715EBC94,6B9A36D18CCA4933,72BEC40917763029,7A35FD885AE104E1,7DA2113291EE38AE,8C06041CC671A0EC,8C42263B74422F6A,9106A4CE5D9D5ACE,9132779747441AFB,94648D5F053554AD,9A3EEB48764CC921,A55A7962A1F001A4,A5F944A401046336,B648A8FBD506CAF9,B99FD813110CBCD3,BEDB558B1367FA21,BF5C59CAEE213690,C116E844BA7F5A40,E57898D9DE780B49' mostRecentReportActionLastModified: '2023-12-29 14:42:33.620' updateIDFrom: '66694010' appversion: '1.4.19-0' pusherSocketID: '633336.90971' authToken: '<REDACTED>' referer: 'ecash' platform: 'ios' api_setCookie: 'false' email: '[email protected]' partnerName: 'expensify.com' HTTP_CF_BOT_SCORE: '97' HTTP_CF_JA3_HASH: '773906b0efdefa24a7f2b8eb6985bf37'

The important part being:

mostRecentReportActionLastModified: '2023-12-29 14:42:33.620'

That's the same as the message from Garrett. The time of that log line was 2023-12-29 14:42:50 826 - so we can only get the updates from the past 17 seconds. Not sure what else to figure from this apart from that it confirms that Jason already had the message from Garrett in Onyx somehow. And as far as I can tell he got this update applied from another call to ReconnectApp.

This is the last successful time Jason got Onyx updates:

https://www.expensify.com/_devportal/tools/logSearch/#query=request_id:(%2283cc3e066ec24754-DFW%22)+AND+timestamp:[2023-12-28T18:34:43.022Z+TO+2023-12-28T20:34:43.022Z]&index=logs_expensify-026023

2023-12-28 19:34:43 019

All times in between either his auth token was expired or the 423 was triggered. The trail is pretty hard to follow.

All I can advise for now is that we should either:

  1. Stop using this mostRecentReportActionLastModified param. It is based on an assumption that could be false (even if we are not sure how it happens)
  2. Figure out how people are getting the partial updates and sending a bad mostRecentReportActionLastModified param.

@sakluger
Copy link
Contributor

sakluger commented Jan 5, 2024

I don't think this is limited to focus mode.

  1. I noticed that not all messages were showing up in my LHN, so I switched to most recent mode on NewDot chrome
  2. I got a push notification on my iPhone for a new message in the chat (https://staging.new.expensify.com/r/1076415708177347) that I also had open on web (Chrome)
  3. The message is not showing up in web

Screenshots

Here is the push notification on mobile:

Push notification

Here's what I saw in web after receiving the push notification on mobile:

image

And here's what I saw in web after clicking to another chat then clicking back:

image

@marcaaron
Copy link
Contributor

@sakluger looks a bit different than this issue

Can't say with certainty but it seems like maybe:

  • Message was sent fine via notification to your mobile
  • Message did not make it to web (maybe it's slow for some reason or you lost connectivity for a moment)
  • Tapped another chat and reliable updates sent you the missing update

@melvin-bot melvin-bot bot added the Overdue label Jan 5, 2024
@marcaaron
Copy link
Contributor

Bringing the focus back to the main theory we have for this issue…

Theory:

mostRecentReportActionLastModified can somehow end up “newer” than the last modified for the “lastUpdateID” you have. When the 423 is handled we will only get everything from the mostRecentReportActionLastModified and there can be older updates you need.

Is there some way to test it? I can't really think of any way. Took a quick look at the reliableUpdates table and it has no created so we can't compare these two values easily.

@MitchExpensify
Copy link
Contributor

Not overdue, Melvin

@melvin-bot melvin-bot bot removed the Overdue label Jan 7, 2024
@melvin-bot melvin-bot bot removed the Overdue label Jan 15, 2024
@marcaaron
Copy link
Contributor

Impossible to say without looking at a specific incident and scanning through the logs (which is very time consuming)

@marcaaron
Copy link
Contributor

FWIW I think we should try to pop this in #vip-vsb or something. There is clearly some sort of hole that needs to be patched and a ton of context about how it might be happening. We should patch it so that the unexpected stuff we have observed can't happen.

@melvin-bot melvin-bot bot added the Overdue label Jan 18, 2024
@marcaaron marcaaron changed the title LHN does not update properly (in focus mode?) when calling ReconnectApp LHN does not update properly when calling ReconnectApp Jan 18, 2024
@marcaaron
Copy link
Contributor

Adding some more logs to help diagnose this. Also proposed to temporarily stop passing the mostRecentReportActionLastModified param here to potentially fix another issue. Not sure if it will solve this as I haven't fully reproduced the issue being experienced here - but have some hope that it will.

Copy link

melvin-bot bot commented Jan 19, 2024

@MitchExpensify @situchan this issue is now 3 weeks old. There is one more week left before this issue breaks WAQ. What needs to happen to get a PR in review this week? Please create a thread in #expensify-open-source to discuss. Thanks!

Copy link

melvin-bot bot commented Jan 19, 2024

@MitchExpensify, @situchan Whoops! This issue is 2 days overdue. Let's get this updated quick!

@sonialiap
Copy link
Contributor

sonialiap commented Jan 22, 2024

@marcaaron is LHN not updating report to read after report is opened by clicking a notification part of this issue? #34698

Not on focus mode

@MitchExpensify
Copy link
Contributor

@mallenexpensify Do you think this fits in wave 5 like #34273 does?

@melvin-bot melvin-bot bot removed the Overdue label Jan 23, 2024
@marcaaron
Copy link
Contributor

@sonialiap possibly - but doesn't seem related to me. That one seems to imply something went wrong with marking chats as read.

@mallenexpensify
Copy link
Contributor

mallenexpensify commented Jan 26, 2024

The PR hit production 2 days ago, is anyone witnessing any changes/improvements? I don't think I have for the issue I'm tracking for 'LHN not updating for chats in #social'.

@MitchExpensify I'm unsure if this is Wave 5, and/or if there's more work left to do.

@melvin-bot melvin-bot bot added the Overdue label Jan 26, 2024
@marcaaron
Copy link
Contributor

marcaaron commented Jan 26, 2024

ReconnectApp should be doing a full load when it fails to get the missing Onyx update now instead of passing the mostRecentReportActionLastModified so I would expect the issue I explained here to now be 100% impossible. If there is still an issue then it's something else.

@mallenexpensify
Copy link
Contributor

For #social, LHN still isn't updating with new chats, been testing all day. Scott and Flo both reported not getting new chats too (which I consider the same and #social not uploading cuz the chat isn't updating in LHN, if it's a DM or a room)

Copy link

melvin-bot bot commented Jan 30, 2024

@MitchExpensify, @situchan Still overdue 6 days?! Let's take care of this!

@MitchExpensify
Copy link
Contributor

FWIW I think we should try to pop this in #vip-vsb

@quinthar, do you agree here?

@melvin-bot melvin-bot bot added Overdue and removed Overdue labels Jan 30, 2024
Copy link

melvin-bot bot commented Feb 5, 2024

@MitchExpensify, @situchan Huh... This is 4 days overdue. Who can take care of this?

@MitchExpensify
Copy link
Contributor

MitchExpensify commented Feb 6, 2024

Is anyone still experiencing this? @iwiznia @marcaaron @mallenexpensify

@melvin-bot melvin-bot bot added Overdue and removed Overdue labels Feb 6, 2024
Copy link

melvin-bot bot commented Feb 12, 2024

@MitchExpensify, @situchan Huh... This is 4 days overdue. Who can take care of this?

@MitchExpensify
Copy link
Contributor

Going to close this in the hope that #33372 fixed it. Please feel free to reopen if anyone is experiencing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 Engineering Internal Requires API changes or must be handled by Expensify staff
Projects
None yet
Development

No branches or pull requests

8 participants