thread-safe access to `Index` #51

danieleades · 2020-04-15T09:12:30Z

there's a lot of unwraps in the git2 index.

Internally the git2::Repository is held in a mutex, which is unlocked and unwrapped in every method. It's not clear what purpose the Mutex serves in this module.

I suspect that the mutex is added so that the index itself is Sync, meaning it can be used "naked" in the tide::Server state. is that correct?

It might be better to move this into the calling code. I think you're kind of cheating by placing a mutex here.

The Indexer trait only has immutable references in the in the interface, which gives the appearance (as far as Rust's type system is concerned) that types which implement Indexer can safely be used across threads from inside an Arc. But since these objects are actually mutating the filesystem, you can end up with all sorts of race conditions. Neither the CLI or the git2 implementation are actually thread safe. (you might consider explicitly marking the Tree as !Sync!

for example you have this code in api/publish.rs-

state.index.add_record(crate_desc)?;
state.index.commit_and_push(commit_msg.as_str())?;

since there are no guarantees there is only one thread running this code at a time in your server, you could find yourself in a position where the commit contains more (or less!) than 1 change to the index. You'd have to be pretty unlucky to run into this, but the likelihood will scale with the number of users.

The other problem with this (as i'm sure you've discovered!) is having only immutable references in the Indexer trait precludes you from caching any information about your index. You have to do a round-trip to the filesystem for every type of query.

The simplest solution would be to wrap the config::State object in a Mutex. This seems to be a pretty common way to handle state in Tide.
While that won't hurt, this is maybe a little coarse for you since there are other objects in your State struct that are already thread safe (like the database connection pool).

alternatively, you could wrap just the Index object in the State struct. Since an Arc<Mutex> can take a trait object, this is a nice way to make your app slightly more composable. instead of Arc<Mutex<Index>>, you could simply have an Arc<Mutex<dyn Indexer>> (ie you no longer care what concrete type the index object is).

If you want to get really clever you could look at using a RwLock to only lock access on the methods that take &mut self. My gut feeling is this might not work since RwLock<T> is only Sync if T is Sync, and you're back to square 1. A simple mutex is probably fine until the number of concurrent users scales by 3 or 4 orders of magnitude.

Either way, you'll then be able to remove the mutex from the git2 implementation (since enforcing single access to the git2::Repository is only part of the problem anyway!)

Architecturally it's cleaner also, the Index is responsible for accessing and mutating the filesystem only, and it is the responsibility of the caller to ensure only one Index is doing its thing at any one time

The text was updated successfully, but these errors were encountered:

Hirevo · 2020-04-25T13:01:09Z

Yes, the Mutex in Git2Index was indeed to have it be Sync.
At the time, I thought it would be fine but you made very good points about how that lock is not actually useful and how having it Sync is not accurate. I didn't realize that and that's completely on me.
I am sold on removing the lock from there and using an Arc<Mutex<config::State>> as Tide's state.
Thank you for pointing that out, that is quite a big deal and could easily lead to consistency issues in the registry's state.

danieleades · 2020-04-25T13:06:08Z

no problem! it also lifts the restriction that all the methods in the Indexer trait should take &self. Being able to take &mut self allows implementors of Index to avoid using interior mutability to modify themselves.

danieleades changed the title ~~remove unwraps from git2Index~~ thread-safe access to Index Apr 17, 2020

Hirevo mentioned this issue Apr 25, 2020

use a boxed trait for the Repository #52

Open

Hirevo added the C-bug Category: Bug label Apr 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

thread-safe access to `Index` #51

thread-safe access to `Index` #51

danieleades commented Apr 15, 2020 •

edited

Loading

Hirevo commented Apr 25, 2020 •

edited

Loading

danieleades commented Apr 25, 2020

thread-safe access to Index #51

thread-safe access to Index #51

Comments

danieleades commented Apr 15, 2020 • edited Loading

Hirevo commented Apr 25, 2020 • edited Loading

danieleades commented Apr 25, 2020

thread-safe access to `Index` #51

thread-safe access to `Index` #51

danieleades commented Apr 15, 2020 •

edited

Loading

Hirevo commented Apr 25, 2020 •

edited

Loading