This package is a staging ground for a major rewrite/overhaul of Transducers.jl. The focus currently is on only moving code here that is actually understood (Transducers.jl has a lot of complex code that none of the maintainers actually understand).
Once this package is ready, it will be turned into a PR to Transducers.jl, but for now it is starting from an empty git repo. Most of the code here is tweaked or straight up copied from Transducers.jl.
Please see https://www.youtube.com/watch?v=OFw1Cu220eA for a simple overview of what transducers are, the current state of the Transducers ecosystem, and what we'd like to see happen in the near future with Transducers.jl
julia> fold(+, Filter(iseven) ⨟ Map(sin), 1:1000; executor=ThreadEx(n=8))
0.5539363521120523
julia> 1:1000 |> Filter(iseven) |> Map(sin) |> fold(+; executor=ThreadEx(n=8))
0.5539363521120523
foldxl
,foldxt
,foldxd
, etc. have been replaced withfold
. Choosing threads, SIMD, Distributed, or other backends (no other ones yet supported) is done with anexecutor
argument, e.g.fold(+, Map(sin), v; executor=ThreadsEx(;n=8))
.- See docstrings for
SequentialEx
,SIMDEx
,ChunkedEx
,ThreadedEx
andDistributedEx
. - Executors support nesting. For example,
ThreadsEx
andDistributedEx
holds an inner executor. The idea here is that you might want to say "first split up the reduction across distributed processes, then split those sub-reductions up onto different threads on those processes, and then do SIMD reductions for the sub-sub-reductions"
- See docstrings for
- Multithreading is more performant and type inferrable
- however, early termination is less mature than upstream
- The implementation of
__foldl__
(now__fold__
) is significantly simpler, and often more performant. We have afoldstyle
trait for opting into certain classes of fold behaviour.- currently only
RecursiveFold
forTuple
/NamedTuple
andIterateFold
for everything else. Traits might not be necessary here, I originally had a third trait for things which should use linear indexing but that's no longer needed, so perhaps this can just be a regular dispatch.
- currently only
- Don't yet support completion of stateful transducers
- Don't yet have a
collect
/tcollect
equivalent - Currently only supporting a very small subset of
Transducer
s from the original library (currently we haveMap
,Filter
,Cat
, andTerminateIf
). - Iterating an
Eduction
is currently not supported.
Please open new issues if you have design questions or ideas of your own.
A Transducer
is a protocol for transforming a reducing function. fold(+, Filter(iseven) ⨟ Map(sin), v)
essentially says "add up all the elements of v
, but before adding them, we discard the non-even numbers and we
apply the sin
function to each element.
Rather than using the (iteration protocol)[https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-iteration]
to transform v
into an iterator of only even numbers where sin
has been applied, Transducers work by transforming +
into a new reducing operator, rf = (Filter(iseven) ⨟ Map(sin))'(+)
which is equivalent to
rf = (x, y) -> iseven(y) ? x + sin(y) : x
. Thus, writing fold(+, Filter(iseven) ⨟ Map(sin), v)
generates code
equivalent to
acc = init
for x in v
if iseven(x)
acc = acc + sin(x)
else
acc = acc
end
end
which is more efficient than an equivalent Iterator
based approach.
The fundamental idea behind this design is to disentangle 'what you want to do' (the transducer) from 'how you do it' (the executor) and 'what type of container your data came from' (the iterator).
The actual stacktrace for this using TransducersNext.jl would look something like
fold(+, Filter(iseven) ⨟ Map(sin), v)
and then call
# inside fold(+, Filter(iseven) ⨟ Map(sin), v)
rf = (Filter(iseven) ⨟ Map(sin))'(+)
init = DefaultInit() # can be set as a kwarg
exec = SequentialEx() # can be set as a kwarg
state = start(rf, init) # this initializes any setup that might need to be done for `rf` before the loop
result = __fold__(rf, state, v, exec) # the main workhorse
if result isa DefaultInit
error(EmptyResultError(rf)) # tell the user that they reduced over an empty collection
end
result
the call to __fold__
will become
# inside fold(+, Filter(iseven) ⨟ Map(sin), v)
## inside __fold__(rf, state, v, SequentialEx())
@unroll 8 for x in v
state = @next(rf, state, x)
end
state
where @unroll 8
says "manually peel out the first 8 iterations" (this is here to help with type stability), and @next
is a shortcut for writing
# inside fold(+, Filter(iseven) ⨟ Map(sin), v)
## inside __fold__(rf, state, v, SequentialEx())
### inside @next(rf, state, x)
val = next(rf, state, x) # next usually just does `rf(state, x)`
if val isa Finished # this is for early termination
return val # break out of the `for` loop
else
val
end
and finally, next(rf, state, x)
will do
# inside fold(+, Filter(iseven) ⨟ Map(sin), v)
## inside __fold__(rf, state, v, SequentialEx())
### inside @next(rf, state, x)
#### inside next(rf, state, x)
##### inside (Filter(iseven) ⨟ Map(sin))'(+)(state, x)
iseven(x) ? state + sin(x) : state