You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a concept I have finally understood after writing the try_fold implementation for rust-lang/rust#116651, where looping through .next() is ~3x slower than iterating using .try_fold().
The difference is (at least in part) because .next() has to check if the buffer needs to be refilled and if an error has occurred for every single byte read, while the .try_fold() approach can do it once per chunk, then just iterate over the chunk without any additional checks.
The same is said to be true about iterating over a BTreeMap or BTreeSet, where .try_fold() iteration is faster because it can skip additional checks while looping over each chunk, although I have not tested that myself.
Internal iteration also allows better compiler optimizations in iterator chains, see rust-lang/rust#101814 (comment). While this particular issue is fixed, it is noted that external iteration is a lot less efficient than internal iteration in general. Many iterator adapters such as .for_each() are implemented in terms of .try_fold() so that they could benefit from internal iteration where available.
The bottom line is:
for x in iter can be less efficient than iter.for_each() or other adapters such as .all(), .any(), .fold() etc. but only in tight loops where per-element overhead is noticeable
Iterator adapters should implement .try_fold() manually by delegating to the .try_fold() of the underlying iterator where possible. (A default implementation of .try_fold() exists, but just calls .next() in a loop, losing the benefits of internal iteration). Implementing .try_fold() can be a way to create faster custom iterators too, like I've done with .bytes(). However, it is not possible to implement your own .try_fold() on stable channel because the Try trait is unstable 😿
The text was updated successfully, but these errors were encountered:
This is a concept I have finally understood after writing the
try_fold
implementation for rust-lang/rust#116651, where looping through.next()
is ~3x slower than iterating using.try_fold()
.The difference is (at least in part) because
.next()
has to check if the buffer needs to be refilled and if an error has occurred for every single byte read, while the.try_fold()
approach can do it once per chunk, then just iterate over the chunk without any additional checks.The same is said to be true about iterating over a
BTreeMap
orBTreeSet
, where.try_fold()
iteration is faster because it can skip additional checks while looping over each chunk, although I have not tested that myself.Internal iteration also allows better compiler optimizations in iterator chains, see rust-lang/rust#101814 (comment). While this particular issue is fixed, it is noted that external iteration is a lot less efficient than internal iteration in general. Many iterator adapters such as
.for_each()
are implemented in terms of.try_fold()
so that they could benefit from internal iteration where available.The bottom line is:
for x in iter
can be less efficient thaniter.for_each()
or other adapters such as.all()
,.any()
,.fold()
etc. but only in tight loops where per-element overhead is noticeable.try_fold()
manually by delegating to the.try_fold()
of the underlying iterator where possible. (A default implementation of.try_fold()
exists, but just calls.next()
in a loop, losing the benefits of internal iteration). Implementing.try_fold()
can be a way to create faster custom iterators too, like I've done with.bytes()
. However, it is not possible to implement your own.try_fold()
on stable channel because theTry
trait is unstable 😿The text was updated successfully, but these errors were encountered: