Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added code to make pointer vector be thread safe #1325

Closed
wants to merge 11 commits into from

Conversation

muthusaravanan3186
Copy link

Added code to make pointer vector be thread safe. Used mutex to lock every operation done on the pointer vector.

@Dimi1010
Copy link
Collaborator

Dimi1010 commented Feb 27, 2024

Kinda curious, is there a particular issue that requires the vector being thread safe?

Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

@muthusaravanan3186
Copy link
Author

We had a requirement when we need to read the packet simultaneously for some kind of edge processing. Yes mutex may be overhead in non-thread environment.

Copy link
Owner

@seladb seladb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@muthusaravanan3186 I believe we can use std::lock_guard in this case which is simpler, no?

Copy link

codecov bot commented Feb 28, 2024

Codecov Report

Attention: Patch coverage is 70.17544% with 17 lines in your changes are missing coverage. Please review.

Project coverage is 82.35%. Comparing base (c551f40) to head (03989cc).

Files Patch % Lines
Common++/header/PointerVector.h 70.17% 15 Missing and 2 partials ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #1325   +/-   ##
=======================================
  Coverage   82.34%   82.35%           
=======================================
  Files         163      163           
  Lines       20937    20975   +38     
  Branches     7906     7910    +4     
=======================================
+ Hits        17241    17273   +32     
- Misses       3025     3030    +5     
- Partials      671      672    +1     
Flag Coverage Δ
alpine317 72.41% <12.50%> (-0.01%) ⬇️
fedora37 72.42% <12.50%> (+0.04%) ⬆️
macos-12 61.42% <67.44%> (+0.01%) ⬆️
macos-13 60.46% <67.44%> (+0.02%) ⬆️
macos-14 60.46% <67.44%> (+0.02%) ⬆️
macos-ventura 61.49% <72.50%> (+0.01%) ⬆️
mingw32 70.28% <12.50%> (-0.03%) ⬇️
mingw64 70.29% <12.50%> (-0.04%) ⬇️
npcap 83.32% <71.73%> (-0.04%) ⬇️
rhel93 72.44% <12.50%> (-0.02%) ⬇️
ubuntu1804 74.74% <30.00%> (-0.01%) ⬇️
ubuntu2004 73.20% <27.27%> (-0.02%) ⬇️
ubuntu2204 72.24% <12.50%> (-0.01%) ⬇️
ubuntu2204-icpx 59.04% <67.44%> (+0.06%) ⬆️
unittest 82.35% <70.17%> (+<0.01%) ⬆️
windows-2019 83.35% <69.38%> (-0.02%) ⬇️
windows-2022 83.34% <71.73%> (-0.04%) ⬇️
winpcap 83.33% <69.38%> (+0.01%) ⬆️
xdp 59.08% <12.50%> (-0.01%) ⬇️
zstd 73.82% <70.73%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seladb
Copy link
Owner

seladb commented Feb 28, 2024

Kinda curious, is there a particular issue that requires the vector being thread safe?

Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

I think @Dimi1010 has a good point. Maybe we can add an optional parameter in the constructor saying whether or not to use the mutex

@tigercosmos
Copy link
Collaborator

Kinda curious, is there a particular issue that requires the vector being thread safe?
Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

I think @Dimi1010 has a good point. Maybe we can add an optional parameter in the constructor saying whether or not to use the mutex

I think if we want to control using lock or not, maybe consider using ifdef USE_LOCK_POINTER_VECTOR preprocessor directive, to prevent run-time branch overhead.

@seladb
Copy link
Owner

seladb commented Feb 28, 2024

Kinda curious, is there a particular issue that requires the vector being thread safe?
Also, if it isn't required all the time, the feature can be made opt-in by injecting the std::mutex type as a template parameter that can be provided a struct implementing no-op lock/unlock/try_unlock methods instead when thread safety is not a requirement, saving the mutex overhead.

I think @Dimi1010 has a good point. Maybe we can add an optional parameter in the constructor saying whether or not to use the mutex

I think if we want to control using lock or not, maybe consider using ifdef USE_LOCK_POINTER_VECTOR preprocessor directive, to prevent run-time branch overhead.

adding an additional if before locking the mutex shouldn't have a big performance penalty I think 🤔

@Dimi1010
Copy link
Collaborator

Dimi1010 commented Feb 28, 2024

@seladb @tigercosmos I was thinking more along the lines of how spdlog solved it with their logging sinks. Ends up requiring a template but you don't really have any runtime overhead for the non-mutex version, and it is not a global switch of all or nothing like a preprocessor directive would be.

Example from spdlog:
https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fostream_sink.h#L39-L40
https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fbase_sink-inl.h#L27

@seladb
Copy link
Owner

seladb commented Feb 28, 2024

@seladb @tigercosmos I was thinking more along the lines of how spdlog solved it with their logging sinks. Ends up requiring a template but you don't really have any runtime overhead for the non-mutex version, and it is not a global switch of all or nothing like a preprocessor directive would be.

Example from spdlog: https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fostream_sink.h#L39-L40 https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fbase_sink-inl.h#L27

I'm not sure implementing a null_mutex has less performance penalty than adding a simple if... 🤔

@Dimi1010
Copy link
Collaborator

@seladb @tigercosmos I was thinking more along the lines of how spdlog solved it with their logging sinks. Ends up requiring a template but you don't really have any runtime overhead for the non-mutex version, and it is not a global switch of all or nothing like a preprocessor directive would be.
Example from spdlog: https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fostream_sink.h#L39-L40 https://github.com/gabime/spdlog/blob/v1.x/include%2Fspdlog%2Fsinks%2Fbase_sink-inl.h#L27

I'm not sure implementing a null_mutex has less performance penalty than adding a simple if... 🤔

Might be, might not. They have null_mutex implemented like this:

struct null_mutex {
    void lock() const {}
    void unlock() const {}
};

whose lock/unlock compile down to just a return instruction back to caller,

The benefit over the if approach is that when you need a non-thread safe version, you don't need to construct and store a mutex anyway in the struct and it removes the need for ifs before every lock/unlock operation. It also allows you to actually use the lock_guard as you can't really just use it cleanly if you have an if block for wanting to lock and then the guard has to keep existing after the if block exists.

I think the potential minor performance cost of a function call over an if statement is worth it, from the point of code readability and maintainability perspective.

@tigercosmos
Copy link
Collaborator

@Dimi1010 's proposal makes sense to me, especially readability and maintainability.

@muthusaravanan3186
Copy link
Author

are we fine with above implementation of using null mutex. If so i can go and make the necessary changes

@seladb
Copy link
Owner

seladb commented Feb 28, 2024

are we fine with above implementation of using null mutex. If so i can go and make the necessary changes

Yes, I think we can go with this approach!

Thank you @Dimi1010 for jumping in and proposing it!

Copy link
Collaborator

@Dimi1010 Dimi1010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments on some cases where the mutex use does not actually keep the structure thread safe.

Honestly, addressing those changes to keep the vector thread safe internally in all cases seems to me being more of a hassle then just having multithreaded code that access the vector simultaneously keep their own synchronization between them then relying on the vector implementation to do the sync. A vector class has too many ways to access/modify the underlying data.

VectorIterator begin() { return m_Vector.begin(); }
VectorIterator begin()
{
std::lock_guard<Mutex> lk(m_Mutex);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this protects as much as you would think.
Yes, you don't get data races when obtaining the iterator but vector iterators generally modify the vector memory on their own, so after the safety car is gone (begin iterator is obtained), all the threads are free to race again via direct iterator modification.

Copy link
Author

@muthusaravanan3186 muthusaravanan3186 Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case on normal scenario too even without multi threading environment. we get an iterator and after some time vector reconstructs itself then the above iterator will be undefined.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, tho in a non-multithreaded scenario you don't have the option of another thread invalidating your iterator whenever. Iterators are invalidated in specific documented calls and the sequence of those calls in relation to iterator usage is deterministic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are planned to use shared_ptr in another PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we want to add this line (and all other similar ones) later when you raise another PR.

Common++/header/PointerVector.h Outdated Show resolved Hide resolved
Common++/header/PointerVector.h Show resolved Hide resolved
Common++/header/PointerVector.h Show resolved Hide resolved
Common++/header/PointerVector.h Show resolved Hide resolved
Common++/header/PointerVector.h Show resolved Hide resolved
Common++/header/PointerVector.h Show resolved Hide resolved
Common++/header/PointerVector.h Show resolved Hide resolved
Common++/header/PointerVector.h Outdated Show resolved Hide resolved
Common++/header/PointerVector.h Outdated Show resolved Hide resolved
@tigercosmos
Copy link
Collaborator

@muthusaravanan3186 would you like to finish this PR?

@muthusaravanan3186
Copy link
Author

@muthusaravanan3186 would you like to finish this PR?

@tigercosmos Our requirement was single producer and single consumer. But based on review comments it seems we need to address multiple consumer too. I just want to know what is the expectation here. If we need to change the PointerVector to hold shared ptr references; then it will be big change.. Please let me know what is expected here.

@seladb
Copy link
Owner

seladb commented Mar 17, 2024

@muthusaravanan3186 would you like to finish this PR?

@tigercosmos Our requirement was single producer and single consumer. But based on review comments it seems we need to address multiple consumer too. I just want to know what is the expectation here. If we need to change the PointerVector to hold shared ptr references; then it will be big change.. Please let me know what is expected here.

@muthusaravanan3186 we can do it in phases. Make some changes discussed above without using shared pointers, and then in the future open another PR to move to shared pointers. The improvements suggested in this PR will make PointerVector mostly thread safe (even if not 100%). We can mention the edge cases in the doxygen documentation

@muthusaravanan3186
Copy link
Author

@seladb can you review now

@tigercosmos
Copy link
Collaborator

@muthusaravanan3186 could you resolve the conflicts?

@muthusaravanan3186
Copy link
Author

@seladb i resolved all the conflicts. But on one of the conflict i am getting the below message
Only those with write access to this repository can merge pull requests.

@tigercosmos
Copy link
Collaborator

tigercosmos commented Mar 27, 2024

@seladb i resolved all the conflicts. But on one of the conflict i am getting the below message Only those with write access to this repository can merge pull requests.

Not sure what did you do with the merge, but you can resolve the conflicts at your forked repo.
Basically you can do the followings:

  1. set https://github.com/seladb/PcapPlusPlus/ as upstream: git remote upstream https://github.com/seladb/PcapPlusPlus/
  2. fetch upstream: git fetch upstream dev
  3. git merge upstream dev
  4. there will be some conflicts showing by git, and you need to resolve them.

@seladb
Copy link
Owner

seladb commented Mar 27, 2024

@muthusaravanan3186 you can look at this StackOverflow question: https://stackoverflow.com/questions/38949951/how-to-solve-merge-conflicts-across-forks

Please notice that you should resolve conflicts between origin/dev and muthusaravanan3186/dev (not between the master branches of origin and your fork)

@muthusaravanan3186
Copy link
Author

@seladb Resolved all the conflicts.


/**
* @class PointerVector
* A template class for representing a std::vector of pointers. Once (a pointer to) an element is added to this vector,
* the element responsibility moves to the vector, meaning the PointerVector will free the object once it's removed from the vector
* This class wraps std::vector and adds the capability of freeing objects once they're removed from it
*/
template<typename T>
template<typename T, typename Mutex=pcpp::nullMutex>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update the document.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename it to NullMutex for consistency in this project. Also, maybe it's better to put NullMutex in detail namespace, since it is only for internal usage.

namespace detail
{
	struct NullMutex { ... }
} // namespace detail

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to NullMutex. But detail namespace is only present on Pcap++. Since this on Common folder hoping will place it here which is more reasonable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigercosmos where we need to update the document?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can update the comment, mentioning about the template argument. We use Doxygen to generate the document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -70,7 +83,7 @@ namespace pcpp
throw;
}
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space

Copy link
Collaborator

@tigercosmos tigercosmos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Fix format, please ensure that you run pre-commit before submitting the code. You can also check if all CI works on your forked repo (Allow CI on the forked repo)
  • Remove all changes that you plan to use on the next PR, e.g. begin(), front(). It doesn't make sense to have a WIP code here, please remove the unnecessary changes in this PR, and if you have time to raise another PR, we can have those changes back on that PR.

@muthusaravanan3186
Copy link
Author

@tigercosmos There is no WIP PR yet; as discussed plan is to merge this PR and start to work on use of shared_ptr.

@tigercosmos
Copy link
Collaborator

@tigercosmos There is no WIP PR yet; as discussed plan is to merge this PR and start to work on use of shared_ptr.

I am okay If you want to add the support of shared_ptr, why not add it in this PR?

template<typename T, typename Mutex=pcpp::NullMutex, typename PointerType=std::add_pointer_t<T> >
class PointerVector {

		void pushBack(PointerType element)
		{
			std::lock_guard<Mutex> lk(m_Mutex);
			m_Vector.push_back(element);
		}

		PointerType front()
		{
			std::lock_guard<Mutex> lk(m_Mutex);
			return m_Vector.front();
		}
}

@muthusaravanan3186
Copy link
Author

As discussed this was the plan

@muthusaravanan3186 we can do it in phases. Make some changes discussed above without using shared pointers, and then in the future open another PR to move to shared pointers. The improvements suggested in this PR will make PointerVector mostly thread safe (even if not 100%). We can mention the edge cases in the doxygen documentation

@tigercosmos
Copy link
Collaborator

As discussed this was the plan

@muthusaravanan3186 we can do it in phases. Make some changes discussed above without using shared pointers, and then in the future open another PR to move to shared pointers. The improvements suggested in this PR will make PointerVector mostly thread safe (even if not 100%). We can mention the edge cases in the doxygen documentation

Then I oppose adding unnecessary locks for the future support of the shared pointer, because (1) those locks are useless for T* (2) using the shared pointer is not on the plan. Let's focus on handling the case that uses T*, and forget about the share pointers at this moment.

@muthusaravanan3186
Copy link
Author

@tigercosmos Back to square one. we have null mutex for normal case and mutex for multi-threading case. This was discussed. Since it is vector of pointers and we are accessing the pointers which itself is not thread safe. Since changing Pointervector to handle shared Pointers is a huge change; we planned to implement in phases. If we need everything at single shot it may take time. Please let me know what is the expectation. If not will drop this PR and work on something else. We already modified PCAPPLUSPLUS according to our need.

@tigercosmos
Copy link
Collaborator

I cannot make decision for this, let's wait for @seladb

@seladb
Copy link
Owner

seladb commented Apr 17, 2024

@muthusaravanan3186 @tigercosmos @Dimi1010 I'm trying to catch up here... I think we need to decide on 2 issues, please correct me if I'm wrong:

  1. The iterator is not thread-safe because after the user receives it the vector can change and invalidate the iterator
  2. The vector holds pointers that are not thread-safe by nature, meaning one thread can hold the pointer and another thread can delete it from the vector which will free the pointer

Which issue(s) are you concerned about and is blocking this PR?

Regarding (1) - do you have any (simple) suggestion on how to solve it?
Regarding (2) - we need to use a std::shared_ptr which is indeed a bigger and a breaking change

@Dimi1010
Copy link
Collaborator

Dimi1010 commented Apr 17, 2024

@seladb @tigercosmos @muthusaravanan3186

Tbh, I don't really see an easy solution to (1) while keeping the current API of the class. The two ways I could see iteration being done both have problems:

  • You either need custom iterator objects that hold a reference to the vector mutex or some other kind of black magic, and even then mutating though the iterator would require custom calls.
  • You remove the iteration capability from the 'thread-safe' version and add a thread-safe method that makes a 'non-thread-safe' snapshot of the objects in the vector at time of call and return that to the calling thread to do as it sees fit.

Regarding point 2. Using std::unique_ptr and std::shared_ptr would remove the need for manual allocation tracking, so that would be useful, and std::shared_ptr would work in removing the problem of dangling pointers after removal from the vector.

With all that said, I am personally of the opinion that the burden of thread safety should not be placed on the vector object itself, but on the program that requires the vector data to be accessed by multiple threads simultaneously. My reasoning is that the vector and raw arrays in general provide way too many ways to access or mutate the data to be easily secured, while the program attempting to access the data from different threads would have a far better knowledge of how the data is being accessed to provide thread-safe data access and avoiding race conditions.

A simpler solution would be having the different threads make a local copy of the vector contents while under synchronization and then proceeding independently using their local copies. This type of solution minimizes the points of contention where sync lock needs to be acquired between the threads, as the alternative would require a sync lock on every element access or mutation. It also removes the need of the underlying objects that are being held by the array being thread-safe too.

@seladb
Copy link
Owner

seladb commented Apr 18, 2024

After reading @Dimi1010 's response I started wondering what is the use case of a thread-safe pointer vector if both (1) and (2) are not solved...

In general, a thread-safe container is only needed when one or more threads are changing the vector, usually while other threads are reading data from it:

  • If (1) is not solved then the threads can't use an iterator (in a for loop for example), so the only way to access the vector data is by using at() or front(). While that's possible, it's not very convenient
  • If (2) is not solved then no thread can remove elements from the vector, otherwise other threads may hold invalid pointers. That's a pretty major limitation that narrows down many of the use-cases for using the vector in a multithread environment

@muthusaravanan3186 can you share your use-case for thread-safe pointer vector? It might help us decide if we actually want to implement it or not

@seladb
Copy link
Owner

seladb commented May 2, 2024

@muthusaravanan3186 @Dimi1010 @tigercosmos let's decide if we want to keep this PR or close it?
Please let me know what you think

@muthusaravanan3186
Copy link
Author

@seladb sorry for late reply.. let's close this PR. Let me if you have anything to work.. will start to work on that

@tigercosmos
Copy link
Collaborator

IMO, the thread-safe issue is interesting. Perhaps put this to issue and keep tracking?

@seladb
Copy link
Owner

seladb commented May 3, 2024

IMO, the thread-safe issue is interesting. Perhaps put this to issue and keep tracking?

@tigercosmos there was an issue for it: #234 but we closed it. We can reopen it if needed, but the real question is whether we want to address it, and if so - how?

@seladb
Copy link
Owner

seladb commented May 3, 2024

@seladb sorry for late reply.. let's close this PR. Let me if you have anything to work.. will start to work on that

@muthusaravanan3186 thank you for being willing to help! You can choose any issue from the issues list or suggest an improvement of your own. If you're looking for "quick wins" you can work on these issues:

Or if you want to implement a new protocol:

@Dimi1010
Copy link
Collaborator

Dimi1010 commented May 3, 2024

I vote close the PR. I don't believe the vector itself is the correct location for thread safety to be located as I explained in my earlier reasoning.

@tigercosmos
Copy link
Collaborator

sure, let's close it.

@tigercosmos tigercosmos closed this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants