Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New faster Queue with capacity >= 0 #473

Merged
merged 10 commits into from
May 5, 2018
Merged

Conversation

sorcio
Copy link
Contributor

@sorcio sorcio commented Mar 17, 2018

Based on ideas and code by @njsmith (#272). Made a PR out of it to discuss API.

  • docs
  • benchmarks

@codecov
Copy link

codecov bot commented Mar 17, 2018

Codecov Report

Merging #473 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #473      +/-   ##
==========================================
- Coverage   99.27%   99.27%   -0.01%     
==========================================
  Files          89       89              
  Lines       10405    10468      +63     
  Branches      721      727       +6     
==========================================
+ Hits        10330    10392      +62     
  Misses         58       58              
- Partials       17       18       +1
Impacted Files Coverage Δ
trio/_sync.py 100% <100%> (ø) ⬆️
trio/tests/test_sync.py 100% <100%> (ø) ⬆️
trio/tests/test_highlevel_open_tcp_listeners.py 99.29% <0%> (-0.71%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f15b7f...8c92729. Read the comment docs.

@njsmith
Copy link
Member

njsmith commented Mar 18, 2018

I'm quite hesitant about adding a multi_get API to trio itself right now. #242 discusses how we could do that in a more general way, and also why it's hard to tell whether it's a good idea or not. In the mean time, though, this kind of Queue implementation is definitely better, so we shouldn't block it waiting to make decisions about select-style functionality...

It'd be good to benchmark just to check, and to document what exactly a 0-size buffer actually means for those who have not been initiated into the ways of CSP.

@sorcio
Copy link
Contributor Author

sorcio commented Mar 19, 2018

Benchmarking code: https://gist.github.com/sorcio/71cbad90335808544c7ea4edbdc9264f

With patch:

Benchmark 1 (1 putter task, 1 getter task)
benchmark1(0,); loops: 10, best of 3: 4.732997059938498
benchmark1(1,); loops: 10, best of 3: 4.310345179983415
benchmark1(10,); loops: 10, best of 3: 4.3298765369690955
benchmark1(100,); loops: 10, best of 3: 4.3633756070630625
Benchmark 2 (put_nowait only)
benchmark2(100,); loops: 10, best of 3: 0.01785309298429638
benchmark2(10000,); loops: 10, best of 3: 0.1284000480081886
benchmark2(100000,); loops: 10, best of 3: 1.245691244956106

Without patch (master):

Benchmark 1 (1 putter task, 1 getter task)
benchmark1(1,); loops: 10, best of 3: 8.013807445997372
benchmark1(10,); loops: 10, best of 3: 4.954212918062694
benchmark1(100,); loops: 10, best of 3: 4.89853437396232
Benchmark 2 (put_nowait only)
benchmark2(100,); loops: 10, best of 3: 0.01489449106156826
benchmark2(10000,); loops: 10, best of 3: 0.43872660608030856
benchmark2(100000,); loops: 10, best of 3: 4.151649273000658

I was thinking of writing more microbenchmarks but I stopped here because I'm not sure how to exercise the right parts and because I don't understand the huge slowness of current trio.Queue(1) in the first benchmark.


I can update the patch to remove multi_get/multi_get_nowait altogether and implement the same in get/get_nowait. It might even very slightly improve benchmark results.

The thing I don't understand though is whether there is some way to achieve the same result without hazmat. My naive approach would be to start a task for each queue I want to consume/select from and put the result in a shared queue. But that's broken because I would get() items before I know whether I can put them or not or the task is going to be canceled.

@sorcio
Copy link
Contributor Author

sorcio commented Apr 12, 2018

Same microbenchmarks as before run on updated master vs updated PR. Now formatted in a fancy table because hey fancy tables. Times in seconds, 10 rounds each. Not comparable with previous benchmark because this ran on a different machine, but also because Trio got generally slower since I last ran the benchmarks.

benchmark1(n): one task putting 10000 messages in a Queue(n), another task consuming them
benchmark2(n): putting n messages in a Queue(n) with put_nowait()

Benchmark master PR 473
benchmark1(0) 8.436079
benchmark1(1) 14.156041 7.705907 83.70% faster
benchmark1(10) 9.160088 7.546915 21.38% faster
benchmark1(100) 9.094151 7.635676 19.10% faster
benchmark2(100) 0.021808 0.013714 59.02% faster
benchmark2(10000) 0.731397 0.191692 281.55% faster
benchmark2(100000) 7.131474 1.928052 269.88% faster

@sorcio
Copy link
Contributor Author

sorcio commented Apr 19, 2018

@njsmith what do you think of this? We can leave this with min size 1, so no documentation change is needed and if it's alright with you it can go in. Otherwise I'm not entirely sure what kind of doc update would be good :)

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looking pretty good! Two small requests below. Also:

Re: docs: I think it'd be good to add a few sentences somewhere explaining that a Queue(0) means that put blocks until get is called and vice-versa, like a CSP channel. It doesn't have to be long or complicated, but for those who haven't been initiated into the mysteries of CSP, I think it's good to at least give the basic idea and give them a pointer to find out more.

And one last thing: can you add a towncrier entry? https://trio.readthedocs.io/en/latest/contributing.html#pull-request-release-notes
The support for Queue(0), and that queue is faster, are both worth mentioning in the release notes.

trio/_sync.py Outdated
self._put_semaphore = Semaphore(capacity, max_value=capacity)
self._get_semaphore = Semaphore(0, max_value=capacity)
# {task: abort func}
self._get_wait = OrderedDict()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this has the abort fn in it. It looks like all the abort fn does is remove the task from self._get_wait, and the only place we use it is in put, which calls it... immediately after removing the task from self._get_wait, so it's a no-op. And then we have a try/except in the get abort fn just to work around this?

Can we make _get_wait a mapping {task: None}, i.e. basically just an ordered set? Or is there a subtlety I'm missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought: I bet this is left-over from my multi_get hack, huh. I think it's unnecessary here now though...

@@ -512,6 +510,29 @@
assert (await q.get()) == 2


async def test_Queue_unbuffered():
Copy link
Member

@njsmith njsmith Apr 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a bit more checking of unbuffered Queue, I'd suggest adding a QueueLock1(0) to the lock_factories and lock_factory_names lists in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works. Attempting acquire the lock (i.e. await queue.put()) will block indefinitely. I definitely like idea to test Queue(0) as used for explicit synchronization though, so I may add a test that relies on rendezvous behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempting acquire the lock (i.e. await queue.put()) will block indefinitely

Oh duh, of course you're right.

@sorcio sorcio changed the title [WIP] New Queue with multi_get() and max_size>=0 New faster Queue with capacity >= 0 Apr 24, 2018
@sorcio
Copy link
Contributor Author

sorcio commented Apr 24, 2018

  1. I realized I had changed the default recommendation for capacity in the docstring from 1 to 0. I guess my reasoning was that some times it's dangerous to put and move forward when no task is waiting to get (it bit me a couple times). But I'm not sure that's the safest recommendation for new users.

  2. maybe I should clean this branch, rebase, make a single commit when this is ready to be merged, instead of this mess?

  3. towncrier entry refers to this PR because there is no issue for unbuffered queues, it just was a happy accident of the new implementation. Maybe I can edit the initial message to add more context.

  4. (what's with test_highlevel_open_tcp_listeners.py coverage?)

@Fuyukai
Copy link
Member

Fuyukai commented Apr 24, 2018

Having a zero-size queue can sound weird unless you realize that the queue both queues data and task scheduling - maybe you should edit the doc to explain why 0-capacity queues would be useful? (to me it doesn't really feel like a queue at that point, more of an event notifier).

@sorcio
Copy link
Contributor Author

sorcio commented Apr 24, 2018

@Fuyukai hmm... I see your point. Where I come from, I see a queue as a FIFO channel: whatever you put you can orderly get from other tasks, and they provide fan-in and fan-out. That queues have attached buffers is some sort of an extra feature that makes it easier to handle put bursts when you can't consume them fast enough. It's very handy, but it's not their fundamental purpose, as I could attach a buffer somewhere else. So, in my mind, I would rather want someone to explain me why >0-capacity queues would be useful and how to be smart about which buffer size to choose. But of course that's neither the only way to use a queue nor the only way to look at them.

Trio docs say "You can use Queue objects to safely pass objects between tasks", so it's pretty neutral w.r.t. buffers. If you think that we should explain why 0-capacity queues would be useful, I guess that means that from some point of view one user might assume that queues are just some kind of data structure (like e.g. deque) and not a synchronization primitive. So maybe we could adjust the doc to avert that assumption, e.g. by saying that queues "can optionally have a buffer that is always bounded in size". Or only worry about this in case we re-design things for #497 and maybe call the concept "channel" or something else.

@njsmith njsmith mentioned this pull request May 5, 2018
@njsmith
Copy link
Member

njsmith commented May 5, 2018

It's still a queue in the sense that you can call put a bunch of times, and then get a bunch of times, and all the gets return the values that were put, and in a FIFO manner. We probably should implement the ideas in #497 and rename it to Channel, and of course docs can always be improved, but I think this is at a good enough place to merge!

@njsmith njsmith merged commit 7118e16 into python-trio:master May 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants