Remove the CRE (possibly infinite) loop from the end of state transfer. #2995

yontyon · 2023-04-13T09:08:45Z

Problem Overview
some of the reconfiguration commands may introduce a certain complexity with state transfer: they may break the ability to complete state transfer (and getting the reconfiguration update). A good example of such a command is the scaling command.
The current solution is to have another reconfiguration mechanism within state transfer - based on the client reconfiguration engine.
The rationale behind this approach is that a replica on state transfer behaves identically to a full copy client (read-only replica) and should get the reconfiguration updates with the same async mechanism as every other client.
However, this means that in order to make sure we got the reconfiguration updates via CRE (the async mechanism) is that we have to make a synchronization phase. In the context of CRE, this was done by making sure we get a newer update than the latest one we got in state transfer.
Unfortunately, getting a bft update requires getting replies from 2f + 1 replica (in the regular case), which means that to finish state transfer we must have other 2f + 1 replicas.

After carefully analyzing the problem, it appears we can have the two mechanisms as long as we sync between them.

We have two options:

the replica was able to get the reconfiguration update prior to the end of state transfer. In this case, the replica will get the new configuration and start the state transfer all over again (because state transfer has not been finished). Then, once we actually finish state transfer in the new configuration, we won't execute the scale command again as it was written to the configurations file.
The replica was not able to get the update with CRE and was able to complete state transfer. Then we simply execute the reconfiguration update as any other reconfiguration command. As before, we update the configurations file with the new configuration.

Notes:

Removing the synchronization phase (the loop), means that every new reconfiguration command should be developed w.r.t to the above problem (there is no generic solution)
The proposed change does not handle some (possibly) atomic issues (such as crashing in the middle of state transfer callbacks). These problems should be analyzed and handled in a different PR.

Testing Done
CI

eyalrund · 2023-04-13T09:20:52Z

looks risky. We need to test State transfer scenarios

yontyon · 2023-04-13T11:14:35Z

looks risky. We need to test State transfer scenarios

There are Apollo tests that test that. But scheduling a deterministic test for testing this scenario is not supported at the moment

glevkovich · 2023-04-20T13:58:18Z

The proposed change does not handle some (possibly) atomic issues (such as crashing in the middle of state transfer callbacks). These problems should be analyzed and handled in a different PR.

@yontyon , if you do not handle all above side cases then my undersanding is that it is not a complete solution. If replica do crash, will it be able to continue somehow, or is it a "game over" for the replica? if the 2nd, are you planning to implement the additional PR ASAP?

From your words, some of the persistency tests will fail for sure. I suggest to disable them. replicas are killed there tens to hundreds of times per test, they will fail for sure on CI.

glevkovich · 2023-04-20T14:10:29Z

kvbc/src/st_reconfiguration_sm.cpp

+  // manage to complete state transfer, CRE is halted and we need to execute the command based on the state we gained
+  // during state transfer. In order not to execute the command twice, we do check that this configuration was not
+  // already executed by reading the local configuration list.
+  std::ofstream configurations_file;


nit: for safety and better use of new built-in C++17 features, consider using std::filesystem for the path:
https://en.cppreference.com/w/cpp/filesystem
https://en.cppreference.com/w/cpp/filesystem/path

Example:
https://stackoverflow.com/questions/6297738/how-to-build-a-full-path-string-safely-from-separate-strings

glevkovich · 2023-04-20T14:14:49Z

@yontyon

Are current tests good enough, or do we need to add new tests to check new side cases due to current modifications?

yontyon · 2023-04-24T09:55:19Z

The proposed change does not handle some (possibly) atomic issues (such as crashing in the middle of state transfer callbacks). These problems should be analyzed and handled in a different PR.

@yontyon , if you do not handle all above side cases then my undersanding is that it is not a complete solution. If replica do crash, will it be able to continue somehow, or is it a "game over" for the replica? if the 2nd, are you planning to implement the additional PR ASAP?

From your words, some of the persistency tests will fail for sure. I suggest to disable them. replicas are killed there tens to hundreds of times per test, they will fail for sure on CI.

@glevkovich , I'm referring to this piece of code:

  // Mark ST final completion to "outside world" - After this call any incoming messages are not dropped.
  setExternalCollectingState(false);

  // Now after external state have changed, invoke the 2nd group of registered callbacks
  // TODO - This next line should be converted to an internal message. There is no need for a callback here. ST is done.
  // worst case we will have some few incoming messages dropped, until key-exchange is done.
  // on_fetching_state_change_cb_registry_ should be removed
  auto size = on_fetching_state_change_cb_registry_.size();
  LOG_INFO(
      logger_,
      "Starting to invoke all registered calls (on_fetching_state_change_cb_registry_):" << KVLOG(size, checkpointNum));
  on_fetching_state_change_cb_registry_.invokeAll(checkpointNum);
  LOG_INFO(logger_, "Done invoking all registered calls (on_fetching_state_change_cb_registry_)");

AFAIU, if the replica has crashed during on_fetching_state_change_cb_registry_ then it won't execute again the callbacks (because state transfer has already finished).

In addition, it's rather a general warning:
Once you rely on two different mechanisms and one of them is implemented on the state transfer callback, the developer has to have in mind that its logic might be executed twice (in case the replica has crashed during the callback execution). Until now there was a general solution (the CRE loop), but now it is the developer's responsibility to handle all these edge cases.

yontyon · 2023-04-24T10:02:27Z

@yontyon

Are current tests good enough, or do we need to add new tests to check new side cases due to current modifications?

To test the specific scenario, we need to somehow be able to exactly to crash a replica exactly where the loop was before and before its CRE was able to fetch the update. AFAIN, we don't have this capability at the moment. So good tests (IMO) are simply to see that the relevant tests do not crash (this PR run for 4 times already and all was good)

teoparvanov · 2023-04-24T10:48:31Z

Hi @yontyon, can you please try the following (currently disabled) test with your PR?
https://github.com/vmware/concord-bft/blob/master/tests/apollo/test_skvbc_restart_recovery.py#L495

I'd suggest to run it in a loop, as a confirmation that your changes here address the underlying test instability.

Thanks!

glevkovich · 2023-04-24T14:36:58Z

The proposed change does not handle some (possibly) atomic issues (such as crashing in the middle of state transfer callbacks). These problems should be analyzed and handled in a different PR.
@yontyon , if you do not handle all above side cases then my undersanding is that it is not a complete solution. If replica do crash, will it be able to continue somehow, or is it a "game over" for the replica? if the 2nd, are you planning to implement the additional PR ASAP?
From your words, some of the persistency tests will fail for sure. I suggest to disable them. replicas are killed there tens to hundreds of times per test, they will fail for sure on CI.

@glevkovich , I'm referring to this piece of code:
  // Mark ST final completion to "outside world" - After this call any incoming messages are not dropped.
  setExternalCollectingState(false);

  // Now after external state have changed, invoke the 2nd group of registered callbacks
  // TODO - This next line should be converted to an internal message. There is no need for a callback here. ST is done.
  // worst case we will have some few incoming messages dropped, until key-exchange is done.
  // on_fetching_state_change_cb_registry_ should be removed
  auto size = on_fetching_state_change_cb_registry_.size();
  LOG_INFO(
      logger_,
      "Starting to invoke all registered calls (on_fetching_state_change_cb_registry_):" << KVLOG(size, checkpointNum));
  on_fetching_state_change_cb_registry_.invokeAll(checkpointNum);
  LOG_INFO(logger_, "Done invoking all registered calls (on_fetching_state_change_cb_registry_)");
AFAIU, if the replica has crashed during on_fetching_state_change_cb_registry_ then it won't execute again the callbacks (because state transfer has already finished).

In addition, it's rather a general warning: Once you rely on two different mechanisms and one of them is implemented on the state transfer callback, the developer has to have in mind that its logic might be executed twice (in case the replica has crashed during the callback execution). Until now there was a general solution (the CRE loop), but now it is the developer's responsibility to handle all these edge cases.

@yontyon
Yes, ST is done at this stage. I wonder if we still need this callback here? After all, it is not part of ST (ST is done alrrady).
In current master code i've added a TODO:
// TODO - This next line should be converted to an internal message. There is no need for a callback here. ST is done.

yontyon requested a review from a team as a code owner April 13, 2023 09:08

vmwclabot added the cla-not-required label Apr 13, 2023

yontyon requested a review from WildFireFlum April 13, 2023 09:09

eyalrund requested review from HristoStaykov and glevkovich April 13, 2023 09:18

glevkovich reviewed Apr 20, 2023

View reviewed changes

WildFireFlum approved these changes Apr 27, 2023

View reviewed changes

yontyon added 2 commits May 1, 2023 09:16

remove state transfer cre loop

dc458f4

enable test_stop_primary_during_initiated_view_change

212f5cc

yontyon force-pushed the remove-st-cre-loop branch from 94bc615 to 212f5cc Compare May 1, 2023 09:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the CRE (possibly infinite) loop from the end of state transfer. #2995

Remove the CRE (possibly infinite) loop from the end of state transfer. #2995

yontyon commented Apr 13, 2023

eyalrund commented Apr 13, 2023

yontyon commented Apr 13, 2023

glevkovich commented Apr 20, 2023 •

edited

Loading

glevkovich Apr 20, 2023

glevkovich commented Apr 20, 2023

yontyon commented Apr 24, 2023 •

edited

Loading

yontyon commented Apr 24, 2023

teoparvanov commented Apr 24, 2023 •

edited

Loading

glevkovich commented Apr 24, 2023

Remove the CRE (possibly infinite) loop from the end of state transfer. #2995

Are you sure you want to change the base?

Remove the CRE (possibly infinite) loop from the end of state transfer. #2995

Conversation

yontyon commented Apr 13, 2023

eyalrund commented Apr 13, 2023

yontyon commented Apr 13, 2023

glevkovich commented Apr 20, 2023 • edited Loading

glevkovich Apr 20, 2023

Choose a reason for hiding this comment

glevkovich commented Apr 20, 2023

yontyon commented Apr 24, 2023 • edited Loading

yontyon commented Apr 24, 2023

teoparvanov commented Apr 24, 2023 • edited Loading

glevkovich commented Apr 24, 2023

glevkovich commented Apr 20, 2023 •

edited

Loading

yontyon commented Apr 24, 2023 •

edited

Loading

teoparvanov commented Apr 24, 2023 •

edited

Loading