You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Internally the git2::Repository is held in a mutex, which is unlocked and unwrapped in every method. It's not clear what purpose the Mutex serves in this module.
I suspect that the mutex is added so that the index itself is Sync, meaning it can be used "naked" in the tide::Server state. is that correct?
It might be better to move this into the calling code. I think you're kind of cheating by placing a mutex here.
The Indexer trait only has immutable references in the in the interface, which gives the appearance (as far as Rust's type system is concerned) that types which implement Indexer can safely be used across threads from inside an Arc. But since these objects are actually mutating the filesystem, you can end up with all sorts of race conditions. Neither the CLI or the git2 implementation are actually thread safe. (you might consider explicitly marking the Tree as !Sync!
since there are no guarantees there is only one thread running this code at a time in your server, you could find yourself in a position where the commit contains more (or less!) than 1 change to the index. You'd have to be pretty unlucky to run into this, but the likelihood will scale with the number of users.
The other problem with this (as i'm sure you've discovered!) is having only immutable references in the Indexer trait precludes you from caching any information about your index. You have to do a round-trip to the filesystem for every type of query.
The simplest solution would be to wrap the config::State object in a Mutex. This seems to be a pretty common way to handle state in Tide.
While that won't hurt, this is maybe a little coarse for you since there are other objects in your State struct that are already thread safe (like the database connection pool).
alternatively, you could wrap just the Index object in the State struct. Since an Arc<Mutex> can take a trait object, this is a nice way to make your app slightly more composable. instead of Arc<Mutex<Index>>, you could simply have an Arc<Mutex<dyn Indexer>> (ie you no longer care what concrete type the index object is).
If you want to get really clever you could look at using a RwLock to only lock access on the methods that take &mut self. My gut feeling is this might not work since RwLock<T> is only Sync if T is Sync, and you're back to square 1. A simple mutex is probably fine until the number of concurrent users scales by 3 or 4 orders of magnitude.
Either way, you'll then be able to remove the mutex from the git2 implementation (since enforcing single access to the git2::Repository is only part of the problem anyway!)
Architecturally it's cleaner also, the Index is responsible for accessing and mutating the filesystem only, and it is the responsibility of the caller to ensure only one Index is doing its thing at any one time
The text was updated successfully, but these errors were encountered:
danieleades
changed the title
remove unwraps from git2Index
thread-safe access to IndexApr 17, 2020
Yes, the Mutex in Git2Index was indeed to have it be Sync.
At the time, I thought it would be fine but you made very good points about how that lock is not actually useful and how having it Sync is not accurate. I didn't realize that and that's completely on me.
I am sold on removing the lock from there and using an Arc<Mutex<config::State>> as Tide's state.
Thank you for pointing that out, that is quite a big deal and could easily lead to consistency issues in the registry's state.
no problem! it also lifts the restriction that all the methods in the Indexer trait should take &self. Being able to take &mut self allows implementors of Index to avoid using interior mutability to modify themselves.
there's a lot of
unwrap
s in the git2 index.Internally the
git2::Repository
is held in a mutex, which is unlocked and unwrapped in every method. It's not clear what purpose the Mutex serves in this module.I suspect that the mutex is added so that the index itself is
Sync
, meaning it can be used "naked" in the tide::Server state. is that correct?It might be better to move this into the calling code. I think you're kind of cheating by placing a mutex here.
The
Indexer
trait only has immutable references in the in the interface, which gives the appearance (as far as Rust's type system is concerned) that types which implementIndexer
can safely be used across threads from inside anArc
. But since these objects are actually mutating the filesystem, you can end up with all sorts of race conditions. Neither the CLI or the git2 implementation are actually thread safe. (you might consider explicitly marking the Tree as!Sync
!for example you have this code in
api/publish.rs
-since there are no guarantees there is only one thread running this code at a time in your server, you could find yourself in a position where the commit contains more (or less!) than 1 change to the index. You'd have to be pretty unlucky to run into this, but the likelihood will scale with the number of users.
The other problem with this (as i'm sure you've discovered!) is having only immutable references in the
Indexer
trait precludes you from caching any information about your index. You have to do a round-trip to the filesystem for every type of query.The simplest solution would be to wrap the
config::State
object in aMutex
. This seems to be a pretty common way to handle state in Tide.While that won't hurt, this is maybe a little coarse for you since there are other objects in your
State
struct that are already thread safe (like the database connection pool).alternatively, you could wrap just the
Index
object in theState
struct. Since anArc<Mutex>
can take a trait object, this is a nice way to make your app slightly more composable. instead ofArc<Mutex<Index>>
, you could simply have anArc<Mutex<dyn Indexer>>
(ie you no longer care what concrete type the index object is).If you want to get really clever you could look at using a
RwLock
to only lock access on the methods that take&mut self
. My gut feeling is this might not work sinceRwLock<T>
is onlySync
ifT
isSync
, and you're back to square 1. A simple mutex is probably fine until the number of concurrent users scales by 3 or 4 orders of magnitude.Either way, you'll then be able to remove the mutex from the git2 implementation (since enforcing single access to the
git2::Repository
is only part of the problem anyway!)Architecturally it's cleaner also, the
Index
is responsible for accessing and mutating the filesystem only, and it is the responsibility of the caller to ensure only oneIndex
is doing its thing at any one timeThe text was updated successfully, but these errors were encountered: