New faster Queue with capacity >= 0 #473

sorcio · 2018-03-17T16:11:06Z

Based on ideas and code by @njsmith (#272). Made a PR out of it to discuss API.

docs
benchmarks

codecov · 2018-03-17T16:17:54Z

Codecov Report

Merging #473 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #473      +/-   ##
==========================================
- Coverage   99.27%   99.27%   -0.01%     
==========================================
  Files          89       89              
  Lines       10405    10468      +63     
  Branches      721      727       +6     
==========================================
+ Hits        10330    10392      +62     
  Misses         58       58              
- Partials       17       18       +1

Impacted Files	Coverage Δ
trio/_sync.py	`100% <100%> (ø)`	⬆️
trio/tests/test_sync.py	`100% <100%> (ø)`	⬆️
trio/tests/test_highlevel_open_tcp_listeners.py	`99.29% <0%> (-0.71%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f15b7f...8c92729. Read the comment docs.

based on ideas and code from python-trio#272

njsmith · 2018-03-18T05:58:26Z

I'm quite hesitant about adding a multi_get API to trio itself right now. #242 discusses how we could do that in a more general way, and also why it's hard to tell whether it's a good idea or not. In the mean time, though, this kind of Queue implementation is definitely better, so we shouldn't block it waiting to make decisions about select-style functionality...

It'd be good to benchmark just to check, and to document what exactly a 0-size buffer actually means for those who have not been initiated into the ways of CSP.

sorcio · 2018-03-19T15:05:03Z

Benchmarking code: https://gist.github.com/sorcio/71cbad90335808544c7ea4edbdc9264f

With patch:

Benchmark 1 (1 putter task, 1 getter task)
benchmark1(0,); loops: 10, best of 3: 4.732997059938498
benchmark1(1,); loops: 10, best of 3: 4.310345179983415
benchmark1(10,); loops: 10, best of 3: 4.3298765369690955
benchmark1(100,); loops: 10, best of 3: 4.3633756070630625
Benchmark 2 (put_nowait only)
benchmark2(100,); loops: 10, best of 3: 0.01785309298429638
benchmark2(10000,); loops: 10, best of 3: 0.1284000480081886
benchmark2(100000,); loops: 10, best of 3: 1.245691244956106

Without patch (master):

Benchmark 1 (1 putter task, 1 getter task)
benchmark1(1,); loops: 10, best of 3: 8.013807445997372
benchmark1(10,); loops: 10, best of 3: 4.954212918062694
benchmark1(100,); loops: 10, best of 3: 4.89853437396232
Benchmark 2 (put_nowait only)
benchmark2(100,); loops: 10, best of 3: 0.01489449106156826
benchmark2(10000,); loops: 10, best of 3: 0.43872660608030856
benchmark2(100000,); loops: 10, best of 3: 4.151649273000658

I was thinking of writing more microbenchmarks but I stopped here because I'm not sure how to exercise the right parts and because I don't understand the huge slowness of current trio.Queue(1) in the first benchmark.

I can update the patch to remove multi_get/multi_get_nowait altogether and implement the same in get/get_nowait. It might even very slightly improve benchmark results.

The thing I don't understand though is whether there is some way to achieve the same result without hazmat. My naive approach would be to start a task for each queue I want to consume/select from and put the result in a shared queue. But that's broken because I would get() items before I know whether I can put them or not or the task is going to be canceled.

sorcio · 2018-04-12T20:52:44Z

Same microbenchmarks as before run on updated master vs updated PR. Now formatted in a fancy table because hey fancy tables. Times in seconds, 10 rounds each. Not comparable with previous benchmark because this ran on a different machine, but also because Trio got generally slower since I last ran the benchmarks.

benchmark1(n): one task putting 10000 messages in a Queue(n), another task consuming them
benchmark2(n): putting n messages in a Queue(n) with put_nowait()

Benchmark	master	PR 473
benchmark1(0)		8.436079
benchmark1(1)	14.156041	7.705907	83.70% faster
benchmark1(10)	9.160088	7.546915	21.38% faster
benchmark1(100)	9.094151	7.635676	19.10% faster
benchmark2(100)	0.021808	0.013714	59.02% faster
benchmark2(10000)	0.731397	0.191692	281.55% faster
benchmark2(100000)	7.131474	1.928052	269.88% faster

sorcio · 2018-04-19T14:25:54Z

@njsmith what do you think of this? We can leave this with min size 1, so no documentation change is needed and if it's alright with you it can go in. Otherwise I'm not entirely sure what kind of doc update would be good :)

njsmith

Overall looking pretty good! Two small requests below. Also:

Re: docs: I think it'd be good to add a few sentences somewhere explaining that a Queue(0) means that put blocks until get is called and vice-versa, like a CSP channel. It doesn't have to be long or complicated, but for those who haven't been initiated into the mysteries of CSP, I think it's good to at least give the basic idea and give them a pointer to find out more.

And one last thing: can you add a towncrier entry? https://trio.readthedocs.io/en/latest/contributing.html#pull-request-release-notes
The support for Queue(0), and that queue is faster, are both worth mentioning in the release notes.

njsmith · 2018-04-21T05:32:02Z

trio/_sync.py

-        self._put_semaphore = Semaphore(capacity, max_value=capacity)
-        self._get_semaphore = Semaphore(0, max_value=capacity)
+        # {task: abort func}
+        self._get_wait = OrderedDict()


I don't understand why this has the abort fn in it. It looks like all the abort fn does is remove the task from self._get_wait, and the only place we use it is in put, which calls it... immediately after removing the task from self._get_wait, so it's a no-op. And then we have a try/except in the get abort fn just to work around this?

Can we make _get_wait a mapping {task: None}, i.e. basically just an ordered set? Or is there a subtlety I'm missing?

On further thought: I bet this is left-over from my multi_get hack, huh. I think it's unnecessary here now though...

njsmith · 2018-04-21T05:35:29Z

trio/tests/test_sync.py

@@ -512,6 +510,29 @@
        assert (await q.get()) == 2


+async def test_Queue_unbuffered():


For a bit more checking of unbuffered Queue, I'd suggest adding a QueueLock1(0) to the lock_factories and lock_factory_names lists in this file.

I don't think this works. Attempting acquire the lock (i.e. await queue.put()) will block indefinitely. I definitely like idea to test Queue(0) as used for explicit synchronization though, so I may add a test that relies on rendezvous behaviour.

Attempting acquire the lock (i.e. await queue.put()) will block indefinitely

Oh duh, of course you're right.

sorcio · 2018-04-24T14:35:12Z

I realized I had changed the default recommendation for capacity in the docstring from 1 to 0. I guess my reasoning was that some times it's dangerous to put and move forward when no task is waiting to get (it bit me a couple times). But I'm not sure that's the safest recommendation for new users.
maybe I should clean this branch, rebase, make a single commit when this is ready to be merged, instead of this mess?
towncrier entry refers to this PR because there is no issue for unbuffered queues, it just was a happy accident of the new implementation. Maybe I can edit the initial message to add more context.
(what's with test_highlevel_open_tcp_listeners.py coverage?)

Fuyukai · 2018-04-24T14:57:47Z

Having a zero-size queue can sound weird unless you realize that the queue both queues data and task scheduling - maybe you should edit the doc to explain why 0-capacity queues would be useful? (to me it doesn't really feel like a queue at that point, more of an event notifier).

sorcio · 2018-04-24T16:54:25Z

@Fuyukai hmm... I see your point. Where I come from, I see a queue as a FIFO channel: whatever you put you can orderly get from other tasks, and they provide fan-in and fan-out. That queues have attached buffers is some sort of an extra feature that makes it easier to handle put bursts when you can't consume them fast enough. It's very handy, but it's not their fundamental purpose, as I could attach a buffer somewhere else. So, in my mind, I would rather want someone to explain me why >0-capacity queues would be useful and how to be smart about which buffer size to choose. But of course that's neither the only way to use a queue nor the only way to look at them.

Trio docs say "You can use Queue objects to safely pass objects between tasks", so it's pretty neutral w.r.t. buffers. If you think that we should explain why 0-capacity queues would be useful, I guess that means that from some point of view one user might assume that queues are just some kind of data structure (like e.g. deque) and not a synchronization primitive. So maybe we could adjust the doc to avert that assumption, e.g. by saying that queues "can optionally have a buffer that is always bounded in size". Or only worry about this in case we re-design things for #497 and maybe call the concept "channel" or something else.

njsmith · 2018-05-05T23:25:57Z

It's still a queue in the sense that you can call put a bunch of times, and then get a bunch of times, and all the gets return the values that were put, and in a FIFO manner. We probably should implement the ideas in #497 and rename it to Channel, and of course docs can always be improved, but I think this is at a good enough place to merge!

new Queue with multi_get() and max_size>=0

a1ac052

based on ideas and code from python-trio#272

sorcio force-pushed the new-queue branch from eeb2636 to a1ac052 Compare March 17, 2018 16:37

remove multi_get for now

77158e4

sorcio force-pushed the new-queue branch from 6005019 to 77158e4 Compare April 12, 2018 19:55

Merge branch 'master' of github.com:python-trio/trio into new-queue

df04db5

njsmith reviewed Apr 21, 2018

View reviewed changes

sorcio added 4 commits April 23, 2018 12:00

remove leftover from multi_get

7cb8e14

mention Queue(0) as a special case in the reference

b379070

add python-trio#473 towncrier entry

535bd80

test Queue(0) as a lock

a901400

sorcio changed the title ~~[WIP] New Queue with multi_get() and max_size>=0~~ New faster Queue with capacity >= 0 Apr 24, 2018

sorcio added 3 commits April 24, 2018 11:40

Merge branch 'master' of python-trio/trio into new-queue

73dc3e8

switch to outcome.Value in _sync.py

738ec8b

assert instead of branch for better coverage

8c92729

njsmith mentioned this pull request May 5, 2018

small fail_at() doc fix #516

Merged

njsmith merged commit 7118e16 into python-trio:master May 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New faster Queue with capacity >= 0 #473

New faster Queue with capacity >= 0 #473

sorcio commented Mar 17, 2018 •

edited

Loading

codecov bot commented Mar 17, 2018 •

edited

Loading

njsmith commented Mar 18, 2018

sorcio commented Mar 19, 2018

sorcio commented Apr 12, 2018

sorcio commented Apr 19, 2018

njsmith left a comment

njsmith Apr 21, 2018

njsmith Apr 21, 2018

njsmith Apr 21, 2018 •

edited

Loading

sorcio Apr 23, 2018

njsmith Apr 24, 2018

sorcio commented Apr 24, 2018 •

edited

Loading

Fuyukai commented Apr 24, 2018

sorcio commented Apr 24, 2018

njsmith commented May 5, 2018

		@@ -512,6 +510,29 @@
		assert (await q.get()) == 2


		async def test_Queue_unbuffered():

New faster Queue with capacity >= 0 #473

New faster Queue with capacity >= 0 #473

Conversation

sorcio commented Mar 17, 2018 • edited Loading

codecov bot commented Mar 17, 2018 • edited Loading

Codecov Report

njsmith commented Mar 18, 2018

sorcio commented Mar 19, 2018

sorcio commented Apr 12, 2018

sorcio commented Apr 19, 2018

njsmith left a comment

Choose a reason for hiding this comment

njsmith Apr 21, 2018

Choose a reason for hiding this comment

njsmith Apr 21, 2018

Choose a reason for hiding this comment

njsmith Apr 21, 2018 • edited Loading

Choose a reason for hiding this comment

sorcio Apr 23, 2018

Choose a reason for hiding this comment

njsmith Apr 24, 2018

Choose a reason for hiding this comment

sorcio commented Apr 24, 2018 • edited Loading

Fuyukai commented Apr 24, 2018

sorcio commented Apr 24, 2018

njsmith commented May 5, 2018

sorcio commented Mar 17, 2018 •

edited

Loading

codecov bot commented Mar 17, 2018 •

edited

Loading

njsmith Apr 21, 2018 •

edited

Loading

sorcio commented Apr 24, 2018 •

edited

Loading