Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If weakrefs are ever used with Series's inner Arc, Series._get_inner_mut() could start panicing #21004

Open
2 tasks done
itamarst opened this issue Jan 30, 2025 · 2 comments
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars

Comments

@itamarst
Copy link
Contributor

itamarst commented Jan 30, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Consider the implementation of this function:

    #[doc(hidden)]
    pub fn _get_inner_mut(&mut self) -> &mut dyn SeriesTrait {
        if Arc::weak_count(&self.0) + Arc::strong_count(&self.0) != 1 {
            self.0 = self.0.clone_inner();
        }
        Arc::get_mut(&mut self.0).expect("implementation error")
    }

Arc::weak_count() and Arc::strong_count() are internally doing load(Relaxed).

The expect() can trigger a panic in the following scenario:

  1. The strong refcount is 2, and another thread has the clone.
  2. The other thread increments the weak refcount, and then decrements the strong refcount.
  3. The current thread sees the latter, but not the former; there is no guarantee both will be seen at the same time the way the code is written.
  4. As a result, the code thinks total refcount is 1 and calls Arc::get_mut().
  5. Internally, Arc::get_mut() uses a private Arc::is_unique() method that is correctly doing consistency, and so it figures out total refcount is 2 (1 strong + 1 week), returns None, we panic on expect().

This is more likely to happen on ARM than x86-64 because of weaker memory consistency. And presumably it doesn't happen very often, but can happen.

In practice, I see no code in Polars that actually creates weak refs to the Arc. So this is merely theoretical at the moment. But it does mean there is an optimizations I don't think should be implemented since it would be unsound if someone started using weakrefs somewhere else in the codebase.

Potential solutions

Two Arc::get_mut()

match Arc::get_mut(&mut self.0) {
    Some(r) => r,
    None => {
        self.0 = self.0.clone_inner();
        Arc::get_mut(&mut self.0)  // or in nightly, `unsafe { Arc::get_mut_unchecked(...) }`
    }
}

Unfortunately the borrow checker is overzealous and won't allow this, at least for now.

Refactor Series

There is probably some structure within Series that would fix this, maybe struct Series(pub Option<Arc<...>>)? This is a pretty pervasive change though, and seems kinda ugly given it's a workaround for a borrow checker limitation.

Switch to Arc implementation that lacks weakrefs, e.g. triomphe::Arc

The triomphe create reimplements Arc without weakrefs, and it has a public Arc::is_unique() method. This would also be faster, which is helpful for list operations that use amortized_iter() and therefore call _get_inner_mut() for every single value in a Series.

Insofar as weakrefs aren't used at all for this code path AFAICT, this shouldn't break anything, but it does mean extra dependency and non-backwards compatible API change. In an ideal world the inner Arc wouldn't be public, though...

Installed versions

Git as of Jan 30, 2025.

@itamarst itamarst added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Jan 30, 2025
@itamarst
Copy link
Contributor Author

This isn't an actual current bug, so not worth taking any action unless it's for the performance benefit of triomphe, but... this seems worth documenting somewhere, because if weakrefs ever start getting used and it ever does happen it will hopefully save someone the pain of debugging a seemingly impossible edge case. So "move this as a comment into _get_inner_mut() and close" also seems like a fine resolution.

@orlp
Copy link
Collaborator

orlp commented Jan 30, 2025

I would be happy to see a PR that replaces all our uses of Arc with triomphe::Arc seeing as it's more efficient and we don't need weakrefs almost everywhere (only in the async executor we use it currently).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars
Projects
None yet
Development

No branches or pull requests

2 participants