Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOLD for payment 2024-05-09] HIGH: [API Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue #38748

Closed
danieldoglas opened this issue Mar 21, 2024 · 31 comments
Assignees
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor

Comments

@danieldoglas
Copy link
Contributor

danieldoglas commented Mar 21, 2024

If you haven’t already, check out our contributing guidelines for onboarding and email [email protected] to request to join our Slack channel!


Background:

An onyxUpdate is content we receive through https responses or Pusher that will update the state the local data. They will only bring a smaller piece of the information that was updated, instead of constantly doing full fetches to update everything.

To guarantee that the data is in a correct state, we need those updates to be applied in order. So every update that will modify local data will have previousUpdateID and lastUpdateID. For every new update we receive, we check the received previousUpdateID with the local lastUpdateID.

If they are the same, that means that we can apply that update and we'll have a correct state of the data. If they're not, then we need to fetch the missing updates (using GetMissingOnyxMessages with an updateIDFrom with the local lastUpdateID and updateIDTo with the previousUpdateID that was received from the server.

Once we have those updates applied, we can then apply the received update.

We have onyxUpdates being applied both through http responses and Pusher (calls here and here). When we call saveUpdateInformation, we we're triggering this callback function that will pause the SequentialQueue (so we don't receive more onyxUpdates until this is finished), fetch the missing updates, apply the pending updates, and then unpause the SequentialQueue. But we can't do that with onyxUpdates that comes from pusher.

Problem:

When we receive onyxUpdates, we'll trigger one GetMissingOnyxMessages per update out of order that comes by.

Scenario:

  1. Alice posts to #big-room to create updateID=10
  2. Bob posts to #small-room two times, to create updateID=11 and 12
  3. Cathy is part of both rooms, and has a local lastUpdateID=9
  4. Cathy receives 11 -- sends GetMissingOnyxUpdates. This call might takes a few seconds.
  5. Cathy receives 12 before (4) is finished. So the client sends GetMissingOnyxUpdates again
  6. Once (4) finishes and it's applied locally, the response for GetMissingOnyxUpdates on (5) will be discarded.

In this scenario, we did 2 GetMissingOnyxMessages, when 1 would be enough. Please note that this is not limited to 2 calls today: if we had received 10 messages, it would do 10 API calls.

In this scenario, we did 3 GetMissingOnyxMessages.

Possible solution:

What I'm proposing is that we implement 3 changes.

  1. Only execute one GetMissingOnyxMessages call at a time;
  2. Implement a deferredUpdates queue that will hold all updates that can't be applied immediately that will arrive while the call on (1) is being executed on an ordered manner. Then apply them if the pre-requisites are filled OR do another GetMissingOnyxMessages if we still find gaps on them
  3. Change the way we deal with received updates. If we can apply it immediately, we should trigger a check on the deferredUpdates queue to confirm if any of the held updates can be applied after we apply the received one.

By doing those, the expected behavior for the same scenario is:

  1. Alice posts to #big-room to create updateID=10
  2. Bob posts to #small-room three times, to create updateID=11, 12 and 13
  3. Cathy is part of both rooms, and has a local lastUpdateID=9
  4. Cathy receives 11 -- sends GetMissingOnyxUpdates. This call takes a few seconds.
  5. Cathy receives 12 and 13 before (4) is finished. So the client stores those updates in the deferredUpdates queue. It's important to note that these updates can arrive unordered.
  6. Once (4) finishes and it's applied locally, we get all updates stored in deferredUpdates and apply them, in order. If we detect any gaps in the received updates, we need to go back to step (4)

In this scenario, we only did 1 GetMissingOnyxUpdates.

Issue OwnerCurrent Issue Owner: @lschurr
@danieldoglas danieldoglas added the Daily KSv2 label Mar 21, 2024
@Szymon20000
Copy link
Contributor

Comment :)

@danieldoglas danieldoglas added the External Added to denote the issue can be worked on by a contributor label Mar 21, 2024
@melvin-bot melvin-bot bot changed the title [HIGH] Rework the current logic for GetMissingOnyxMessages [$500] [HIGH] Rework the current logic for GetMissingOnyxMessages Mar 21, 2024
Copy link

melvin-bot bot commented Mar 21, 2024

Job added to Upwork: https://www.upwork.com/jobs/~010f52251ca52f954d

@melvin-bot melvin-bot bot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Mar 21, 2024
Copy link

melvin-bot bot commented Mar 21, 2024

Triggered auto assignment to Contributor-plus team member for initial proposal review - @sobitneupane (External)

@danieldoglas
Copy link
Contributor Author

No need for C+ on this one!

@danieldoglas danieldoglas changed the title [$500] [HIGH] Rework the current logic for GetMissingOnyxMessages [HIGH] Rework the current logic for GetMissingOnyxMessages Mar 21, 2024
@wildan-m
Copy link
Contributor

Proposal

Please re-state the problem that we are trying to solve in this issue.

When we receive onyxUpdates, we'll trigger one GetMissingOnyxMessages per update out of order that comes by.

What is the root cause of that problem?

We allow calling App.getMissingOnyxUpdates while fetching missing updates is in progress.

What changes do you think we should make in order to solve the problem?

We can introduce isFetchingMissingUpdates, and ensure it will only call another App.getMissingOnyxUpdates when isFetchingMissingUpdates === false

// ...

// Add a new state to track if we are currently fetching missing updates
let isFetchingMissingUpdates = false;

export default () => {
    console.debug('[OnyxUpdateManager] Listening for updates from the server');
    Onyx.connect({
        key: ONYXKEYS.ONYX_UPDATES_FROM_SERVER,
        callback: (value) => {
            if (!value || isFetchingMissingUpdates) {
                return;
            }

            // ...

            // Set isFetchingMissingUpdates to true before we start fetching
            isFetchingMissingUpdates = true;

            canUnpauseQueuePromise.finally(() => {
                OnyxUpdates.apply(updateParams).finally(() => {
                    console.debug('[OnyxUpdateManager] Done applying all updates');
                    Onyx.set(ONYXKEYS.ONYX_UPDATES_FROM_SERVER, null);
                    SequentialQueue.unpause();

                    // Set isFetchingMissingUpdates back to false after we finish fetching
                    isFetchingMissingUpdates = false;
                });
            });
        },
    });
};

What alternative solutions did you explore? (Optional)

@quinthar quinthar changed the title [HIGH] Rework the current logic for GetMissingOnyxMessages [HIGH] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdateQueue Mar 24, 2024
@melvin-bot melvin-bot bot added the Overdue label Mar 24, 2024
@quinthar quinthar changed the title [HIGH] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdateQueue [HIGH] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue Mar 24, 2024
@quinthar quinthar changed the title [HIGH] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue HIGH: [Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue Mar 25, 2024
@danieldoglas
Copy link
Contributor Author

danieldoglas commented Mar 25, 2024

@Szymon20000 will start on this Today.

@wildan-m thanks for your proposal, but this was already assigned to a contributor!

@melvin-bot melvin-bot bot removed the Overdue label Mar 25, 2024
@wildan-m
Copy link
Contributor

@danieldoglas no worries. Is there any specific tags that determine if an issue already taken or not?

@danieldoglas
Copy link
Contributor Author

@wildan-m you should apply only for issues that have the label Help wanted

@wildan-m
Copy link
Contributor

@wildan-m you should apply only for issues that have the label Help wanted

@danieldoglas, this post contains it, Melvin might have mistakenly added it. I think you should remove it.

@quinthar quinthar removed the Help Wanted Apply this label when an issue is open to proposals by contributors label Mar 26, 2024
@quinthar
Copy link
Contributor

Sorry bout that @wildan-m !

@chrispader
Copy link
Contributor

I'm taking this issue over from @Szymon20000. Going to investigate this now and give an ETA until the end of the day!

Copy link

melvin-bot bot commented Apr 8, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.4.60-13 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-04-15. 🎊

For reference, here are some details about the assignees on this issue:

  • @hungvu193 requires payment (Needs manual offer from BZ)
  • @chrispader does not require payment (Contractor)
  • @eh2077 requires payment (Needs manual offer from BZ)

Copy link

melvin-bot bot commented Apr 8, 2024

BugZero Checklist: The PR fixing this issue has been merged! The following checklist (instructions) will need to be completed before the issue can be closed:

  • [@hungvu193 / @eh2077] The PR that introduced the bug has been identified. Link to the PR:
  • [@hungvu193 / @eh2077] The offending PR has been commented on, pointing out the bug it caused and why, so the author and reviewers can learn from the mistake. Link to comment:
  • [@hungvu193 / @eh2077] A discussion in #expensify-bugs has been started about whether any other steps should be taken (e.g. updating the PR review checklist) in order to catch this type of bug sooner. Link to discussion:
  • [@hungvu193 / @eh2077] Determine if we should create a regression test for this bug.
  • [@hungvu193 / @eh2077] If we decide to create a regression test for the bug, please propose the regression test steps to ensure the same bug will not reach production again.
  • [@lschurr] Link the GH issue for creating/updating the regression test once above steps have been agreed upon:

@danieldoglas danieldoglas removed the Awaiting Payment Auto-added when associated PR is deployed to production label Apr 8, 2024
@danieldoglas danieldoglas changed the title [HOLD for payment 2024-04-15] HIGH: [Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue HIGH: [Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue Apr 8, 2024
@danieldoglas
Copy link
Contributor Author

That's fake news, we've reverted that PR and we're testing it again.

@lschurr
Copy link
Contributor

lschurr commented Apr 16, 2024

What's the latest on this one @danieldoglas?

@melvin-bot melvin-bot bot removed the Overdue label Apr 16, 2024
@danieldoglas
Copy link
Contributor Author

Still being tested, it's a tough one to get right :(

Copy link

melvin-bot bot commented Apr 18, 2024

@danieldoglas @hungvu193 @chrispader @lschurr @eh2077 this issue is now 4 weeks old, please consider:

  • Finding a contributor to fix the bug
  • Closing the issue if BZ has been unable to add the issue to a VIP or Wave project
  • If you have any questions, don't hesitate to start a discussion in #expensify-open-source

Thanks!

@danieldoglas
Copy link
Contributor Author

I mean, it's assigned to someone already @MelvinBot .

@chrispader
Copy link
Contributor

@danieldoglas i'm gonna work on the tests today!

@quinthar quinthar changed the title HIGH: [Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue HIGH: [API Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue Apr 27, 2024
@melvin-bot melvin-bot bot added the Overdue label Apr 27, 2024
@danieldoglas
Copy link
Contributor Author

Hopefully we'll merge this PR again tomorrow, and we won't face any issues this time 🙏🏾

@melvin-bot melvin-bot bot added Weekly KSv2 Awaiting Payment Auto-added when associated PR is deployed to production and removed Overdue Weekly KSv2 labels Apr 29, 2024
@melvin-bot melvin-bot bot changed the title HIGH: [API Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue [HOLD for payment 2024-05-09] HIGH: [API Reliability] Prevent simultaneous calls to GetMissingOnyxMessages using a deferredUpdates queue May 2, 2024
Copy link

melvin-bot bot commented May 2, 2024

The solution for this issue has been 🚀 deployed to production 🚀 in version 1.4.69-2 and is now subject to a 7-day regression period 📆. Here is the list of pull requests that resolve this issue:

If no regressions arise, payment will be issued on 2024-05-09. 🎊

For reference, here are some details about the assignees on this issue:

  • @hungvu193 requires payment (Needs manual offer from BZ)
  • @chrispader does not require payment (Contractor)
  • @eh2077 requires payment (Needs manual offer from BZ)

Copy link

melvin-bot bot commented May 2, 2024

BugZero Checklist: The PR fixing this issue has been merged! The following checklist (instructions) will need to be completed before the issue can be closed:

  • [@hungvu193 / @eh2077] The PR that introduced the bug has been identified. Link to the PR:
  • [@hungvu193 / @eh2077] The offending PR has been commented on, pointing out the bug it caused and why, so the author and reviewers can learn from the mistake. Link to comment:
  • [@hungvu193 / @eh2077] A discussion in #expensify-bugs has been started about whether any other steps should be taken (e.g. updating the PR review checklist) in order to catch this type of bug sooner. Link to discussion:
  • [@hungvu193 / @eh2077] Determine if we should create a regression test for this bug.
  • [@hungvu193 / @eh2077] If we decide to create a regression test for the bug, please propose the regression test steps to ensure the same bug will not reach production again.
  • [@lschurr] Link the GH issue for creating/updating the regression test once above steps have been agreed upon:

@melvin-bot melvin-bot bot added Daily KSv2 and removed Weekly KSv2 labels May 8, 2024
@lschurr
Copy link
Contributor

lschurr commented May 8, 2024

Payment summary:

There were two reviewers on this PR. Both are due $500

@lschurr
Copy link
Contributor

lschurr commented May 9, 2024

All paid!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Payment Auto-added when associated PR is deployed to production Bug Something is broken. Auto assigns a BugZero manager. Daily KSv2 External Added to denote the issue can be worked on by a contributor
Projects
No open projects
Development

No branches or pull requests

9 participants