Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlocks and improve no-timeout get() calls. #17

Merged
merged 6 commits into from
May 10, 2022

Conversation

mattgibbs
Copy link
Contributor

This PR fixes two totally separate issues (sorry).

  1. capv.unsubscribe_channel() and capv.clear_channel() could cause deadlocks if called while a monitor callback is in use, due to interactions with the GIL from callback threads (see https://epics.anl.gov/base/R7-0/4-docs/CAref.html#ca_clear_channel). I fixed this by releasing the GIL before calling ca_clear_channel() or ca_clear_subscription(). I believe this is reasonable, and it works in my testing, but I have to admit that I'm not 100% sure this is always a safe thing to do.
  2. When calling Pv.get(timeout=None), you were almost guaranteed to crash because the value usually won't be available by the time the method returns. I added a check for that, and the method now returns None if the value isn't ready yet. This makes Pv.get(timeout=None) much more useful if you want to get very large numbers of PVs at once: You can call it in a loop, expect to get back none, then collect all the values in a second pass.

This is my totally unofficial contribution to the EPICS 2022 Codeathon :)

@mattgibbs mattgibbs requested a review from ZLLentz May 10, 2022 07:37
@@ -574,6 +582,10 @@ def get(self, count=None, timeout=DEFAULT_TIMEOUT, as_string=False, **kw):
if tmo > 0 and DEBUG != 0:
logprint("got %s\n" % self.value.__str__())

if "value" not in self.data:
# If the timeout was None, there's a chance that the value hasn't arrived yet.
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptual question: what happens if we have self.data and self.value from a previous get? Will this block of code be bypassed and will the function return a stale value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, this isn't a reason not to do this- but if it's true, it should be documented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, I think this is how pyca behaves right now if you follow up a successful get with a new timeout=None get, so it's not something in-scope of this PR to modify)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right - if you do Pv.get(timeout=None) after Pv.data is already populated (via a previous get(), or due to monitor callbacks), you'll get the most recently seen value, which might be quite old. This isn't new behavior to the PR.

I think this is actually the expected outcome with timeout=None. The only element of surprise is that there's now a chance that you might get None instead of a real value.

Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good functionally- I added some minor whitespace-fixing suggestions + and documentation food for thought. I'm also going to ping @mcb64 because he has more experience with this code than I do as a sanity check.

@ZLLentz ZLLentz requested a review from mcb64 May 10, 2022 16:58
@ZLLentz
Copy link
Member

ZLLentz commented May 10, 2022

Also: did you re-run the unit tests? I think we had some on this repo, though our continuous integration chain is definitely busted right now on this one.

mattgibbs and others added 2 commits May 10, 2022 10:07
Co-authored-by: Zachary Lentz <ZLLentz@users.noreply.github.com>
Co-authored-by: Zachary Lentz <ZLLentz@users.noreply.github.com>
@mattgibbs
Copy link
Contributor Author

@ZLLentz I did run the unit tests, they all pass.

@mcb64
Copy link

mcb64 commented May 10, 2022

Yeah, I think this looks good. CA itself has no knowledge of the GIL, so it should be safe to release it here (and restore the state after the CA call). pyca_monitor_handler tries to the GIL before touching python, which is why you are getting a deadlock here. (The danger would be if you release the GIL here and then the code proceeds to call python functions under the assumption that you already have the GIL. This is not the case though.)

@ZLLentz
Copy link
Member

ZLLentz commented May 10, 2022

Excellent, let's merge it then! I'll also put out a new tag with this PR included.

@ZLLentz ZLLentz merged commit eff926b into slaclab:master May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants