forked from python/peps
-
Notifications
You must be signed in to change notification settings - Fork 0
/
pep-0255.txt
524 lines (415 loc) · 19.6 KB
/
pep-0255.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
PEP: 255
Title: Simple Generators
Version: $Revision$
Last-Modified: $Date$
Author: [email protected] (Neil Schemenauer),
[email protected] (Tim Peters),
[email protected] (Magnus Lie Hetland)
Discussions-To: [email protected]
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Requires: 234
Created: 18-May-2001
Python-Version: 2.2
Post-History: 14-Jun-2001, 23-Jun-2001
Abstract
========
This PEP introduces the concept of generators to Python, as well as a new
statement used in conjunction with them, the ``yield`` statement.
Motivation
==========
When a producer function has a hard enough job that it requires maintaining
state between values produced, most programming languages offer no pleasant and
efficient solution beyond adding a callback function to the producer's argument
list, to be called with each value produced.
For example, ``tokenize.py`` in the standard library takes this approach: the
caller must pass a *tokeneater* function to ``tokenize()``, called whenever
``tokenize()`` finds the next token. This allows tokenize to be coded in a
natural way, but programs calling tokenize are typically convoluted by the need
to remember between callbacks which token(s) were seen last. The *tokeneater*
function in ``tabnanny.py`` is a good example of that, maintaining a state
machine in global variables, to remember across callbacks what it has already
seen and what it hopes to see next. This was difficult to get working
correctly, and is still difficult for people to understand. Unfortunately,
that's typical of this approach.
An alternative would have been for tokenize to produce an entire parse of the
Python program at once, in a large list. Then tokenize clients could be
written in a natural way, using local variables and local control flow (such as
loops and nested if statements) to keep track of their state. But this isn't
practical: programs can be very large, so no a priori bound can be placed on
the memory needed to materialize the whole parse; and some tokenize clients
only want to see whether something specific appears early in the program (e.g.,
a future statement, or, as is done in IDLE, just the first indented statement),
and then parsing the whole program first is a severe waste of time.
Another alternative would be to make tokenize an iterator [1], delivering the
next token whenever its ``.next()`` method is invoked. This is pleasant for the
caller in the same way a large list of results would be, but without the memory
and "what if I want to get out early?" drawbacks. However, this shifts the
burden on tokenize to remember *its* state between ``.next()`` invocations, and
the reader need only glance at ``tokenize.tokenize_loop()`` to realize what a
horrid chore that would be. Or picture a recursive algorithm for producing the
nodes of a general tree structure: to cast that into an iterator framework
requires removing the recursion manually and maintaining the state of the
traversal by hand.
A fourth option is to run the producer and consumer in separate threads. This
allows both to maintain their states in natural ways, and so is pleasant for
both. Indeed, Demo/threads/Generator.py in the Python source distribution
provides a usable synchronized-communication class for doing that in a general
way. This doesn't work on platforms without threads, though, and is very slow
on platforms that do (compared to what is achievable without threads).
A final option is to use the Stackless [2] [3] variant implementation of Python
instead, which supports lightweight coroutines. This has much the same
programmatic benefits as the thread option, but is much more efficient.
However, Stackless is a controversial rethinking of the Python core, and it may
not be possible for Jython to implement the same semantics. This PEP isn't the
place to debate that, so suffice it to say here that generators provide a
useful subset of Stackless functionality in a way that fits easily into the
current CPython implementation, and is believed to be relatively
straightforward for other Python implementations.
That exhausts the current alternatives. Some other high-level languages
provide pleasant solutions, notably iterators in Sather [4], which were
inspired by iterators in CLU; and generators in Icon [5], a novel language
where every expression *is a generator*. There are differences among these,
but the basic idea is the same: provide a kind of function that can return an
intermediate result ("the next value") to its caller, but maintaining the
function's local state so that the function can be resumed again right where it
left off. A very simple example::
def fib():
a, b = 0, 1
while 1:
yield b
a, b = b, a+b
When ``fib()`` is first invoked, it sets *a* to 0 and *b* to 1, then yields *b*
back to its caller. The caller sees 1. When ``fib`` is resumed, from its
point of view the ``yield`` statement is really the same as, say, a ``print``
statement: ``fib`` continues after the yield with all local state intact. *a*
and *b* then become 1 and 1, and ``fib`` loops back to the ``yield``, yielding
1 to its invoker. And so on. From ``fib``'s point of view it's just
delivering a sequence of results, as if via callback. But from its caller's
point of view, the ``fib`` invocation is an iterable object that can be resumed
at will. As in the thread approach, this allows both sides to be coded in the
most natural ways; but unlike the thread approach, this can be done efficiently
and on all platforms. Indeed, resuming a generator should be no more expensive
than a function call.
The same kind of approach applies to many producer/consumer functions. For
example, ``tokenize.py`` could yield the next token instead of invoking a
callback function with it as argument, and tokenize clients could iterate over
the tokens in a natural way: a Python generator is a kind of Python
iterator [1]_, but of an especially powerful kind.
Specification: Yield
=====================
A new statement is introduced::
yield_stmt: "yield" expression_list
``yield`` is a new keyword, so a ``future`` statement [8]_ is needed to phase
this in: in the initial release, a module desiring to use generators must
include the line::
from __future__ import generators
near the top (see PEP 236 [8]_) for details). Modules using the identifier
``yield`` without a ``future`` statement will trigger warnings. In the
following release, ``yield`` will be a language keyword and the ``future``
statement will no longer be needed.
The ``yield`` statement may only be used inside functions. A function that
contains a ``yield`` statement is called a generator function. A generator
function is an ordinary function object in all respects, but has the new
``CO_GENERATOR`` flag set in the code object's co_flags member.
When a generator function is called, the actual arguments are bound to
function-local formal argument names in the usual way, but no code in the body
of the function is executed. Instead a generator-iterator object is returned;
this conforms to the iterator protocol [6]_, so in particular can be used in
for-loops in a natural way. Note that when the intent is clear from context,
the unqualified name "generator" may be used to refer either to a
generator-function or a generator-iterator.
Each time the ``.next()`` method of a generator-iterator is invoked, the code
in the body of the generator-function is executed until a ``yield`` or
``return`` statement (see below) is encountered, or until the end of the body
is reached.
If a ``yield`` statement is encountered, the state of the function is frozen,
and the value of *expression_list* is returned to ``.next()``'s caller. By
"frozen" we mean that all local state is retained, including the current
bindings of local variables, the instruction pointer, and the internal
evaluation stack: enough information is saved so that the next time
``.next()`` is invoked, the function can proceed exactly as if the ``yield``
statement were just another external call.
Restriction: A ``yield`` statement is not allowed in the ``try`` clause of a
``try/finally`` construct. The difficulty is that there's no guarantee the
generator will ever be resumed, hence no guarantee that the finally block will
ever get executed; that's too much a violation of finally's purpose to bear.
Restriction: A generator cannot be resumed while it is actively running::
>>> def g():
... i = me.next()
... yield i
>>> me = g()
>>> me.next()
Traceback (most recent call last):
...
File "<string>", line 2, in g
ValueError: generator already executing
Specification: Return
======================
A generator function can also contain return statements of the form::
return
Note that an *expression_list* is not allowed on return statements in the body
of a generator (although, of course, they may appear in the bodies of
non-generator functions nested within the generator).
When a return statement is encountered, control proceeds as in any function
return, executing the appropriate ``finally`` clauses (if any exist). Then a
``StopIteration`` exception is raised, signalling that the iterator is
exhausted. A ``StopIteration`` exception is also raised if control flows off
the end of the generator without an explicit return.
Note that return means "I'm done, and have nothing interesting to return", for
both generator functions and non-generator functions.
Note that return isn't always equivalent to raising ``StopIteration``: the
difference lies in how enclosing ``try/except`` constructs are treated. For
example,::
>>> def f1():
... try:
... return
... except:
... yield 1
>>> print list(f1())
[]
because, as in any function, ``return`` simply exits, but::
>>> def f2():
... try:
... raise StopIteration
... except:
... yield 42
>>> print list(f2())
[42]
because ``StopIteration`` is captured by a bare ``except``, as is any
exception.
Specification: Generators and Exception Propagation
====================================================
If an unhandled exception-- including, but not limited to, ``StopIteration``
--is raised by, or passes through, a generator function, then the exception is
passed on to the caller in the usual way, and subsequent attempts to resume the
generator function raise ``StopIteration``. In other words, an unhandled
exception terminates a generator's useful life.
Example (not idiomatic but to illustrate the point)::
>>> def f():
... return 1/0
>>> def g():
... yield f() # the zero division exception propagates
... yield 42 # and we'll never get here
>>> k = g()
>>> k.next()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in g
File "<stdin>", line 2, in f
ZeroDivisionError: integer division or modulo by zero
>>> k.next() # and the generator cannot be resumed
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration
>>>
Specification: Try/Except/Finally
==================================
As noted earlier, ``yield`` is not allowed in the ``try`` clause of a
``try/finally`` construct. A consequence is that generators should allocate
critical resources with great care. There is no restriction on ``yield``
otherwise appearing in ``finally`` clauses, ``except`` clauses, or in the
``try`` clause of a ``try/except`` construct::
>>> def f():
... try:
... yield 1
... try:
... yield 2
... 1/0
... yield 3 # never get here
... except ZeroDivisionError:
... yield 4
... yield 5
... raise
... except:
... yield 6
... yield 7 # the "raise" above stops this
... except:
... yield 8
... yield 9
... try:
... x = 12
... finally:
... yield 10
... yield 11
>>> print list(f())
[1, 2, 4, 5, 8, 9, 10, 11]
>>>
Example
=======
::
# A binary tree class.
class Tree:
def __init__(self, label, left=None, right=None):
self.label = label
self.left = left
self.right = right
def __repr__(self, level=0, indent=" "):
s = level*indent + `self.label`
if self.left:
s = s + "\n" + self.left.__repr__(level+1, indent)
if self.right:
s = s + "\n" + self.right.__repr__(level+1, indent)
return s
def __iter__(self):
return inorder(self)
# Create a Tree from a list.
def tree(list):
n = len(list)
if n == 0:
return []
i = n / 2
return Tree(list[i], tree(list[:i]), tree(list[i+1:]))
# A recursive generator that generates Tree labels in in-order.
def inorder(t):
if t:
for x in inorder(t.left):
yield x
yield t.label
for x in inorder(t.right):
yield x
# Show it off: create a tree.
t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
# Print the nodes of the tree in in-order.
for x in t:
print x,
print
# A non-recursive generator.
def inorder(node):
stack = []
while node:
while node.left:
stack.append(node)
node = node.left
yield node.label
while not node.right:
try:
node = stack.pop()
except IndexError:
return
yield node.label
node = node.right
# Exercise the non-recursive generator.
for x in t:
print x,
print
Both output blocks display::
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Q & A
=====
Why not a new keyword instead of reusing ``def``?
-------------------------------------------------
See BDFL Pronouncements section below.
Why a new keyword for ``yield``? Why not a builtin function instead?
---------------------------------------------------------------------
Control flow is much better expressed via keyword in Python, and yield is a
control construct. It's also believed that efficient implementation in Jython
requires that the compiler be able to determine potential suspension points at
compile-time, and a new keyword makes that easy. The CPython reference
implementation also exploits it heavily, to detect which functions *are*
generator-functions (although a new keyword in place of ``def`` would solve
that for CPython -- but people asking the "why a new keyword?" question don't
want any new keyword).
Then why not some other special syntax without a new keyword?
-------------------------------------------------------------
For example, one of these instead of ``yield 3``::
return 3 and continue
return and continue 3
return generating 3
continue return 3
return >> , 3
from generator return 3
return >> 3
return << 3
>> 3
<< 3
* 3
Did I miss one <wink>? Out of hundreds of messages, I counted three
suggesting such an alternative, and extracted the above from them. It would be
nice not to need a new keyword, but nicer to make ``yield`` very clear -- I
don't want to have to *deduce* that a yield is occurring from making sense of a
previously senseless sequence of keywords or operators. Still, if this
attracts enough interest, proponents should settle on a single consensus
suggestion, and Guido will Pronounce on it.
Why allow ``return`` at all? Why not force termination to be spelled ``raise StopIteration``?
----------------------------------------------------------------------------------------------
The mechanics of ``StopIteration`` are low-level details, much like the
mechanics of ``IndexError`` in Python 2.1: the implementation needs to do
*something* well-defined under the covers, and Python exposes these mechanisms
for advanced users. That's not an argument for forcing everyone to work at
that level, though. ``return`` means "I'm done" in any kind of function, and
that's easy to explain and to use. Note that ``return`` isn't always equivalent
to ``raise StopIteration`` in try/except construct, either (see the
"Specification: Return" section).
Then why not allow an expression on ``return`` too?
---------------------------------------------------
Perhaps we will someday. In Icon, ``return expr`` means both "I'm done", and
"but I have one final useful value to return too, and this is it". At the
start, and in the absence of compelling uses for ``return expr``, it's simply
cleaner to use ``yield`` exclusively for delivering values.
BDFL Pronouncements
===================
Issue
-----
Introduce another new keyword (say, ``gen`` or ``generator``) in place
of ``def``, or otherwise alter the syntax, to distinguish generator-functions
from non-generator functions.
Con
---
In practice (how you think about them), generators *are* functions, but
with the twist that they're resumable. The mechanics of how they're set up
is a comparatively minor technical issue, and introducing a new keyword would
unhelpfully overemphasize the mechanics of how generators get started (a vital
but tiny part of a generator's life).
Pro
---
In reality (how you think about them), generator-functions are actually
factory functions that produce generator-iterators as if by magic. In this
respect they're radically different from non-generator functions, acting more
like a constructor than a function, so reusing ``def`` is at best confusing.
A ``yield`` statement buried in the body is not enough warning that the
semantics are so different.
BDFL
----
``def`` it stays. No argument on either side is totally convincing, so I
have consulted my language designer's intuition. It tells me that the syntax
proposed in the PEP is exactly right - not too hot, not too cold. But, like
the Oracle at Delphi in Greek mythology, it doesn't tell me why, so I don't
have a rebuttal for the arguments against the PEP syntax. The best I can come
up with (apart from agreeing with the rebuttals ... already made) is "FUD".
If this had been part of the language from day one, I very much doubt it would
have made Andrew Kuchling's "Python Warts" page.
Reference Implementation
========================
The current implementation, in a preliminary state (no docs, but well tested
and solid), is part of Python's CVS development tree [9]_. Using this requires
that you build Python from source.
This was derived from an earlier patch by Neil Schemenauer [7]_.
Footnotes and References
========================
.. [1] PEP 234, Iterators, Yee, Van Rossum
http://www.python.org/dev/peps/pep-0234/
.. [2] http://www.stackless.com/
.. [3] PEP 219, Stackless Python, McMillan
http://www.python.org/dev/peps/pep-0219/
.. [4] "Iteration Abstraction in Sather"
Murer, Omohundro, Stoutamire and Szyperski
http://www.icsi.berkeley.edu/~sather/Publications/toplas.html
.. [5] http://www.cs.arizona.edu/icon/
.. [6] The concept of iterators is described in PEP 234. See [1] above.
.. [7] http://python.ca/nas/python/generator.diff
.. [8] PEP 236, Back to the __future__, Peters
http://www.python.org/dev/peps/pep-0236/
.. [9] To experiment with this implementation, check out Python from CVS
according to the instructions at http://sf.net/cvs/?group_id=5470
Note that the std test ``Lib/test/test_generators.py`` contains many
examples, including all those in this PEP.
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: