Skip to content

Commit

Permalink
Terminate Mirror/Aviso background threads also when server is shutdown
Browse files Browse the repository at this point in the history
Before this change, the background threads were only destroyed when the server state changed from running -> halted.
This introduced an issue, as changing state from halted -> shutdown and running -> shutdown would allow reaching the shutdown state with sometimes running threads.

Re ECFLOW-1986
  • Loading branch information
marcosbento committed Nov 14, 2024
1 parent 28327ff commit 9fc6e51
Showing 1 changed file with 10 additions and 3 deletions.
13 changes: 10 additions & 3 deletions libs/server/src/ecflow/server/BaseServer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -323,8 +323,9 @@ void BaseServer::shutdown() {
/// RUNNING yes yes yes yes
/// SHUTDOWN yes yes no yes
/// HALTED yes no no no
if (serverEnv_.debug())
if (serverEnv_.debug()) {
cout << " BaseServer::shutdown. Stop Scheduling new jobs only" << endl;
}

// Stop server from creating new jobs. Don't stop the checkPtSaver_ since
// the jobs communication with server can still change state. Which we want
Expand All @@ -336,6 +337,9 @@ void BaseServer::shutdown() {
// If we go from HALTED --> SHUTDOWN, then check pointing needs to be enabled
checkPtSaver_.start();

// Stop all Mirror/Aviso attributes (i.e. background threads are stopped)
ecf::visit_all(*defs_, ShutdownDefs{});

// Will update defs as well to stop job scheduling
set_server_state(SState::SHUTDOWN);
}
Expand All @@ -345,10 +349,11 @@ void BaseServer::halted() {
/// RUNNING yes yes yes yes
/// SHUTDOWN yes yes no yes
/// HALTED yes no no no
if (serverEnv_.debug())
if (serverEnv_.debug()) {
cout << " BaseServer::halted. Stop Scheduling new jobs *and* block task communication. Stop check pointing. "
"Only accept user request"
<< endl;
}

// Stop server from creating new jobs. i.e Job scheduling.
traverser_.stop();
Expand All @@ -360,6 +365,7 @@ void BaseServer::halted() {
// Added after discussion with Axel.
checkPtSaver_.stop();

// Stop all Mirror/Aviso attributes (i.e. background threads are stopped)
ecf::visit_all(*defs_, ShutdownDefs{});

// Stop the task communication with server. Hence nodes can be stuck
Expand All @@ -374,8 +380,9 @@ void BaseServer::restart() {
/// RUNNING yes yes yes yes
/// SHUTDOWN yes yes no yes
/// HALTED yes no no no
if (serverEnv_.debug())
if (serverEnv_.debug()) {
std::cout << " BaseServer::restart" << endl;
}

// The server state *MUST* be set, *before* traverser_.start(), since that can kick off job traversal.
// Job Scheduling can only be done under RUNNING state, hence must be before traverser_.start();
Expand Down

0 comments on commit 9fc6e51

Please sign in to comment.