Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time out SUBSCRIBE query instead of forking. #1783

Merged
merged 7 commits into from
Jun 24, 2024

Conversation

tylerkaraszewski
Copy link
Contributor

@tylerkaraszewski tylerkaraszewski commented Jun 21, 2024

Details

See issue, it's pretty detailed.

Fixed Issues

Fixes https://github.com/Expensify/Expensify/issues/406697

Tests

Testing on commit e146093 (before rolling back the test hacks):

We successfully time out the query:

2024-06-21T19:53:05.538431+00:00 expensidev2004 bedrock10001: xxxxxx (libstuff.cpp:151) SException [sync] [info] Throwing exception with message: 'synchronization query timeout' from sqlitecluster/SQLiteNode.cpp:2154

This kills the peer connection:

2024-06-21T19:53:05.538542+00:00 expensidev2004 bedrock10001: xxxxxx (SQLiteNode.cpp:1785) _onMESSAGE [sync] [warn] {cluster_node_0/LEADING} ->{cluster_node_2} Error processing message 'SUBSCRIBE' (synchronization query timeout), reconnecting.

Follower reconnects, switches to searching:

2024-06-21T19:53:05.538636+00:00 expensidev2004 bedrock10007: xxxxxx (SQLitePeer.cpp:90) postPoll [sync] [hmmm] {cluster_node_0} Lost peer connection after 101ms, reconnecting in 739ms
2024-06-21T19:53:05.538800+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'SEARCHING'

Follower runs through the state machine, succeeds on second try:

2024-06-21T19:53:05.578719+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SEARCHING} Switching from 'SEARCHING' to 'SYNCHRONIZING'
2024-06-21T19:53:05.582020+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SYNCHRONIZING} Switching from 'SYNCHRONIZING' to 'WAITING'
2024-06-21T19:53:06.273311+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/WAITING} Switching from 'WAITING' to 'SUBSCRIBING'
2024-06-21T19:53:06.291824+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1970) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'FOLLOWING'

Auth built and tested against this as well.


Internal Testing Reminder: when changing bedrock, please compile auth against your new changes

@tylerkaraszewski tylerkaraszewski self-assigned this Jun 21, 2024
@tylerkaraszewski tylerkaraszewski changed the title [WIP] Time out SUBSCRIBE query instead of forking. Time out SUBSCRIBE query instead of forking. Jun 21, 2024

// We set this for all commits because this only gets all commits in response to SUBSCRIBE, which is done synchronously, and blocks the commit thread.
// For asynchronous queries, there's nothing being blocked, so it doesn't much matter how long these take.
// This is really not the correct encapsulation for this, but we can improve that later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pet peeve, but this is a TODO in a comment. Can we make a GH issue for it instead?

Suggested change
// This is really not the correct encapsulation for this, but we can improve that later.

// We set this for all commits because this only gets all commits in response to SUBSCRIBE, which is done synchronously, and blocks the commit thread.
// For asynchronous queries, there's nothing being blocked, so it doesn't much matter how long these take.
// This is really not the correct encapsulation for this, but we can improve that later.
timeoutLimitUS = 10'000'000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the cluster gives up on the leader node after 30s, should we make this higher and closer to 30s, say 25?

If there's a reason for 10 and it's not an arbitrary number, let's add a comment

_timeoutLimit = STimeNow() + timeoutLimitUS;
}
int queryResult = SQuery(_db, "getting commits", query, result);
_timeoutLimit = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be calling clearTimeout here?

@tylerkaraszewski
Copy link
Contributor Author

@iwiznia @cead22 updated!

I haven't re-run the tests yet (which are fairly manual). I will do that before merging, but feel free to re-review/approve.

@iwiznia iwiznia changed the title Time out SUBSCRIBE query instead of forking. [HOLD test] Time out SUBSCRIBE query instead of forking. Jun 24, 2024
@tylerkaraszewski
Copy link
Contributor Author

Updated testing on the latest change showing the same behavior as the original testing:


2024-06-24T18:43:54.249843+00:00 expensidev2004 bedrock10001: xxxxxx (libstuff.cpp:151) SException [sync] [info] Throwing exception with message: 'synchronization query timeout' from sqlitecluster/SQLiteNode.cpp:2153
2024-06-24T18:43:54.250186+00:00 expensidev2004 bedrock10001: xxxxxx (SQLiteNode.cpp:1788) _onMESSAGE [sync] [warn] {cluster_node_0/LEADING} ->{cluster_node_2} Error processing message 'SUBSCRIBE' (synchronization query timeout), reconnecting.
2024-06-24T18:43:54.250287+00:00 expensidev2004 bedrock10007: xxxxxx (SQLitePeer.cpp:90) postPoll [sync] [hmmm] {cluster_node_0} Lost peer connection after 3ms, reconnecting in 629ms
2024-06-24T18:43:54.250333+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'SEARCHING'
2024-06-24T18:43:54.250513+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/SEARCHING} Switching from 'SEARCHING' to 'WAITING'
2024-06-24T18:43:54.728064+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/WAITING} Switching from 'WAITING' to 'SUBSCRIBING'
2024-06-24T18:43:54.750226+00:00 expensidev2004 bedrock10007: xxxxxx (SQLiteNode.cpp:1973) _changeState [sync] [info] {cluster_node_2/SUBSCRIBING} Switching from 'SUBSCRIBING' to 'FOLLOWING'

@tylerkaraszewski tylerkaraszewski changed the title [HOLD test] Time out SUBSCRIBE query instead of forking. Time out SUBSCRIBE query instead of forking. Jun 24, 2024
@tylerkaraszewski tylerkaraszewski merged commit 2c1102a into main Jun 24, 2024
1 check passed
@tylerkaraszewski tylerkaraszewski deleted the tyler-timeout-subscribe-query branch June 24, 2024 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants