Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion regarding .~ #722

Open
torfjelde opened this issue Nov 27, 2024 · 4 comments
Open

Confusion regarding .~ #722

torfjelde opened this issue Nov 27, 2024 · 4 comments

Comments

@torfjelde
Copy link
Member

Ref: TuringLang/Turing.jl#2390 (comment)

MWE:

julia> @model demo(x) = x .~ Normal()
demo (generic function with 2 methods)

julia> demo(missing)()
ERROR: MethodError: no method matching dot_assume(::Random.TaskLocalRNG, ::SampleFromPrior, ::Normal{…}, ::VarName{…}, ::Missing, ::UntypedVarInfo{…})
[...]

julia> demo([missing])()
ERROR: MethodError: no method matching loglikelihood(::Normal{Float64}, ::Vector{Union{Missing, Float64}})
[...]

Even if you use condition, you still run into a similar issue when specifying components, i.e.

julia> @model function demo()
           x = Vector{Float64}(undef, 2)
           x .~ Normal()
       end
demo (generic function with 4 methods)

julia> (demo() | (@varname(x[1]) => 1.0,))()  # not conditioned
2-element Vector{Float64}:
 1.6841936984140546
 0.3645020534551503

The reason why we do this is because of performance: we want .~ to be faster than just iterating over the entire broadcast expression and calling tilde_assume!! and tilde_observe!! separately. However, this is very, very confusing for users, and I think it's very understandable.

Sooo the question is how to fix this. A few immediate thoughts:

  1. Do we need .~? Does it just make things confusing?
  2. Should we just convert it into a loop over the broadcasted expression and drop the perf benefits?
@mhauru
Copy link
Member

mhauru commented Nov 27, 2024

How significant are the perf benefits? Could we benchmark this?

Either of the proposed two solutions would also simplify the tilde pipeline code a lot, improving maintainability. It's a source of a lot of corner cases and code complexity.

@mhauru
Copy link
Member

mhauru commented Nov 27, 2024

We could ask on Slack/Discourse whether our users are making heavy use of .~, to gauge how much effort we should put into keeping it around and performant.

@yebai
Copy link
Member

yebai commented Nov 29, 2024

  1. Do we need .~? Does it just make things confusing?
  2. Should we just convert it into a loop over the broadcasted expression and drop the perf benefits?

I lean towards option 2 or something similar if it simplifies the implementation. The tilde pipeline is becoming a bit too (deep and) complex to reason about.

@mhauru
Copy link
Member

mhauru commented Jan 28, 2025

I tried to make a simple benchmark to see the effect of .~. The results came out quite underwhelming:

julia> module MWE

       using Turing
       import ReverseDiff

       adbackend = AutoReverseDiff()
       alg = NUTS(; adtype=adbackend)
       n_samples = 20
       y = randn(1000)

       @model function test_tilde(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           x1 .~ Normal()
           x2 = Vector{T}(undef, num_values)
           x2 .~ Gamma()
           m = sum(x1 .* x2)
           y .~ Normal(m)
       end

       m_tilde = test_tilde(y)
       @time sample(m_tilde, alg, n_samples)

       @model function test_loop(y, ::Type{T}=Float64) where {T}
           num_values = length(y)
           x1 = Vector{T}(undef, num_values)
           for i in eachindex(x1)
               x1[i] ~ Normal()
           end
           x2 = Vector{T}(undef, num_values)
           for i in eachindex(x2)
               x2[i] ~ Gamma()
           end
           m = sum(x1 .* x2)
           for i in eachindex(y)
               y[i] ~ Normal(m)
           end
       end

       m_loop = test_loop(y)
       @time sample(m_loop, alg, n_samples)

       end
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:03
186.599898 seconds (3.79 G allocations: 166.483 GiB, 9.57% gc time, 1.02% compilation time)
┌ Info: Found initial step size
└   ϵ = 0.000390625
Sampling 100%|█████████████████████████████████████████████████████████████████████████████████| Time: 0:03:33
216.282827 seconds (4.17 G allocations: 181.365 GiB, 8.79% gc time, 0.91% compilation time)
Main.MWE

Is there some case where it makes a more dramatic difference? If so, can someone write an example?

Unless it makes a very substantial difference in some common use case, I'm in favour of keeping the syntax for backwards compatibility, but turning every .~ into a loop over eachindex(lhs) at macro time and internally dealing with only ~. I haven't thought through how that would work for more complex broadcasting than the LHS being a vector and RHS being a scalar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants