Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix/libwebsocket dtor deadlock #896

Merged
merged 30 commits into from
Dec 13, 2024
Merged

Conversation

AssemblyJohn
Copy link
Contributor

@AssemblyJohn AssemblyJohn commented Dec 4, 2024

Describe your changes

  • Better split code and thread responsibilities
  • Added a generic safe queue to simplify count of conditional variables and mutexes used
  • Cleared the connection_mutex usage in websocket client threads
  • Modified reconnect logic to internal reconnect logic on the websocket thread for better stability
  • The connect in websocket is not sync any more, but only kickstarts the connections attempts

Issue ticket number and link

Core CI: EVerest/everest-core#984

Checklist before requesting a review

@AssemblyJohn AssemblyJohn force-pushed the bugfix/libwebsocket_dtor_deadlock branch 6 times, most recently from 58b5b4f to 5fa2dd4 Compare December 4, 2024 12:47
@AssemblyJohn AssemblyJohn force-pushed the bugfix/libwebsocket_dtor_deadlock branch 2 times, most recently from f712dad to 2c9d45e Compare December 5, 2024 09:42
@AssemblyJohn
Copy link
Contributor Author

The integration tests have been run, the unit tests have been run and received feedback for required behavior for some corner-cases. Everything passes, so I am requesting a review with other possible problems that we might encounter with this code reorganization: @hikinggrass @marcemmers @Pietfried @maaikez

@AssemblyJohn AssemblyJohn force-pushed the bugfix/libwebsocket_dtor_deadlock branch 3 times, most recently from a0f4d44 to 36d08e5 Compare December 9, 2024 08:53
Copy link
Contributor

@marcemmers marcemmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran out of time, these are my findings until now. Mostly small ones

include/ocpp/common/safe_queue.hpp Outdated Show resolved Hide resolved
}

/// \brief Notifies a single waiting thread to wake up
inline void notify_waiting_thread() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this one a bit strange. I get why it is being done since the threads might be waiting on this queue, but it feels like we are still mixing some concerns here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions? Expose the cv?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, it's probably not an easy thing to do right so it's probably best to continue with this for now.

To elaborate on my concern. There are 2 things we are doing;

  1. We have a queue that we need to protect for multi threaded access, that is done in this class.
  2. We have a thread running that is waiting for input from the queue or input from other places. This is now done half in this class (the cv) and half in the thread part of the websocket. So either we fully move the thread into this class and handle it only in here, or we move the cv out of this one. This has more implications I know, that's why I think its best to leave it as is for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand the concerns. However, I've thought at something. The problem is that not always the queue knows when to wake up the waiting thread. When we push a message? When we pop a message? Both? That's why the 'notify_waiting_thread' could be removed and replaced with a C++ template policy enum that specifies the used case.

For example:

enum ESafeQueueWakeupPolicy { WAKE_UP_NEVER, WAKE_UP_ON_POP, WAKE_UP_ON_PUSH, WAKE_UP_ALWAYS };

Of course that adds the complexity of having to know what the queue will do.

For moving the thread in the queue, I've also thought at that but I wouldn't do it in the SafeQueue class, I'd create a QueueWorkerThread that contains both the std::thread and an internal SafeQueue and that receives as a ctor a lambda worked parameter or something along those lines.

However at the moment it seems the queue wakeup policy makes the most sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure how that would look yet, might be a good option. At the same time, I understood this needs to be merged soon for some kind of release so might be wise to just accept that this is some technical debt and accept it for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not be that complicated (~30 mins of work), and it would remove the 'notify_waiting_thread()' function better separating functionality as you've said. That will give us a head-start when we will properly fix the other technical debt, the proper management of the thread that is attached to this queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented here a queue policy on when to wake up. However I am not sure that it is the best idea. Maybe we should notify the thread always to wake up on an operation?

4ac4485

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any drawback with notifying on each queue operation and then we could omit the policy. I would suggest to accept the changes for now.

Of course one could reduce the scope and complexity of the SafeQueue class to the basic queue operations, move the cv out of it and pass e.g. a lambda triggered on every operation (or even based on a passed policy), and we can discuss this afterwards.

But in the current design of the libwebsockets impl the logic that uses the queues requires the notifications and this leads to the fact that the SafeQueue always need the notify mechanism one or the other way. Therefore I'm also fine with the current implementation since the SafeQueue is currently only used here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment I've removed the policy and will just notify the thread on all operations. The old behavior was that the CV is out of the class, however that added a lot of failure points in the libwebsocket class that always required to call that CV notify, since in there we have 3 queues.

Maybe the lambda triggered on each operation could be the best idea, but I'd say to postpone it until we need the SafeQueue in another class that is not the websocket.

include/ocpp/common/safe_queue.hpp Show resolved Hide resolved
lib/ocpp/common/websocket/websocket_libwebsockets.cpp Outdated Show resolved Hide resolved
include/ocpp/common/safe_queue.hpp Outdated Show resolved Hide resolved
lib/ocpp/common/websocket/websocket_libwebsockets.cpp Outdated Show resolved Hide resolved
AssemblyJohn and others added 21 commits December 12, 2024 16:23
Signed-off-by: AssemblyJohn <[email protected]>
Signed-off-by: AssemblyJohn <[email protected]>
Signed-off-by: AssemblyJohn <[email protected]>
…e only need to change the websocket connection options

Signed-off-by: Piet Gömpel <[email protected]>
Signed-off-by: AssemblyJohn <[email protected]>
…ion of what the function does

Signed-off-by: AssemblyJohn <[email protected]>
Signed-off-by: AssemblyJohn <[email protected]>
Signed-off-by: AssemblyJohn <[email protected]>
Signed-off-by: AssemblyJohn <[email protected]>
@AssemblyJohn AssemblyJohn force-pushed the bugfix/libwebsocket_dtor_deadlock branch from 06897bb to 0d7e93e Compare December 12, 2024 14:23
include/ocpp/common/safe_queue.hpp Outdated Show resolved Hide resolved
}

/// \brief Notifies a single waiting thread to wake up
inline void notify_waiting_thread() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any drawback with notifying on each queue operation and then we could omit the policy. I would suggest to accept the changes for now.

Of course one could reduce the scope and complexity of the SafeQueue class to the basic queue operations, move the cv out of it and pass e.g. a lambda triggered on every operation (or even based on a passed policy), and we can discuss this afterwards.

But in the current design of the libwebsockets impl the logic that uses the queues requires the notifications and this leads to the fact that the SafeQueue always need the notify mechanism one or the other way. Therefore I'm also fine with the current implementation since the SafeQueue is currently only used here

include/ocpp/common/safe_queue.hpp Outdated Show resolved Hide resolved
@AssemblyJohn AssemblyJohn force-pushed the bugfix/libwebsocket_dtor_deadlock branch from 1252f94 to 16ddbb1 Compare December 13, 2024 07:53
Signed-off-by: AssemblyJohn <[email protected]>
@AssemblyJohn AssemblyJohn force-pushed the bugfix/libwebsocket_dtor_deadlock branch from 16ddbb1 to 6ce05ae Compare December 13, 2024 08:16
@AssemblyJohn AssemblyJohn merged commit 230ae1a into main Dec 13, 2024
7 checks passed
@AssemblyJohn AssemblyJohn deleted the bugfix/libwebsocket_dtor_deadlock branch December 13, 2024 09:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants