-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The logpdf from ExponentialFamily.jl
is slower then from Distributions.jl
#146
Comments
ExponentialFamily.jl
is slover then inside Distributions.jl
ExponentialFamily.jl
is slower then from Distributions.jl
This is expected behaviour. Different parametrizations have different speed characteristics when evaluating The idea is to use julia> optimised = logpdf_optimized(golden_normal); # should return MvNormalMeanPrecision
julia> @benchmark logpdf($golden_normal, $samples)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 229.023 μs … 838.760 μs ┊ GC (min … max): 0.00% … 65.07%
Time (median): 232.834 μs ┊ GC (median): 0.00%
Time (mean ± σ): 242.893 μs ± 42.271 μs ┊ GC (mean ± σ): 1.04% ± 4.62%
▂█▅▃▄▃ ▃▂▁▅▃ ▁▂▁ ▁ ▁ ▂
█████████████████▇▇██████▆▄▇▆▄▄▆▅▅▃▆▆▅▄▆▆▆▆▆▆▅▅▅▄▆▇▃▄▁▃▄▃▄▃▄▃ █
229 μs Histogram: log(frequency) by time 352 μs <
Memory estimate: 211.44 KiB, allocs estimate: 3003.
julia> @benchmark logpdf($optimised, $samples)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 147.888 μs … 1.122 ms ┊ GC (min … max): 0.00% … 83.80%
Time (median): 152.977 μs ┊ GC (median): 0.00%
Time (mean ± σ): 158.866 μs ± 43.102 μs ┊ GC (mean ± σ): 1.91% ± 5.80%
▇▇█▇▃ ▁ ▂
██████▆▅▅▇██▆▇▅▄▂▃▄███▇▆▇▆▄▆▆▅█▆▄▆▅▅▅▆▆▆▇▇▅▄▅▄▅▄▄▅▄▂▄▃▂▄▄▄▅▄ █
148 μs Histogram: log(frequency) by time 265 μs <
Memory estimate: 289.19 KiB, allocs estimate: 2001. Thus being said, there is still a lot of room for improvements. We can keep this issue open for future reference. |
Yes, you are absolutely correct; different parametrizations exhibit varied speed characteristics when evaluating logpdf. However, I didn't understand why we can't adopt a seemingly simplistic approach like the following: if we want to evaluate a container of samples for the same distribution, we will reuse the log-partition (which appears to be the primary issue). For instance, something like this: using ExponentialFamily
using StableRNGs
using LinearAlgebra
using Distributions
using BenchmarkTools
rng = StableRNG(42)
n = 4
golden_μ = randn(rng, n)
L = randn(rng, n, n)
golden_Σ = L * L' + Matrix{Float64}(I, n, n)
golden_normal = Distributions.MvNormal(golden_μ, golden_Σ);
ef_mv_normal = ExponentialFamily.MvNormalMeanCovariance(golden_μ, golden_Σ)
ef = convert(ExponentialFamily.ExponentialFamilyDistribution, ef_mv_normal)
samples = rand(golden_normal, 1000);
@benchmark logpdf($golden_normal, $samples)
@benchmark logpdf($ef_mv_normal, $samples)
hybridlinearize(matrix::AbstractMatrix) = eachcol(matrix)
@benchmark map((x) -> logpdf($(ef), x), $(hybridlinearize(samples)))
function _logpdf(ef, x::AbstractMatrix)
stats_part = map(
(s) -> begin
stats = ExponentialFamily.sufficientstatistics(ef, s)
sum(ExponentialFamily.pack_parameters(stats) .* ExponentialFamily.getnaturalparameters(ef))
end,
hybridlinearize(x)
)
basemeasure_part = map(
(x) -> begin
log(ExponentialFamily.basemeasure(ef, x))
end,
hybridlinearize(x)
)
logpartion_part = logpartition(ef)
return stats_part .+ basemeasure_part .- logpartion_part
end
@benchmark _logpdf($(ef), $(samples)) For example for normals it's efficient enough: BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 126.709 μs … 1.958 ms ┊ GC (min … max): 0.00% … 91.33%
Time (median): 130.125 μs ┊ GC (median): 0.00%
Time (mean ± σ): 155.554 μs ± 187.075 μs ┊ GC (mean ± σ): 14.23% ± 10.78%
█▂ ▁
██▇▆▄▅▃▃▄▁▁▁▁▃▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▇ █
127 μs Histogram: log(frequency) by time 1.65 ms <
Memory estimate: 649.19 KiB, allocs estimate: 3005. |
I think you are right, we definitely can (and should) do it. Would mine opening a PR for that? |
yes, I can 👍 |
Can be closed? |
Yes, I think so, we have a separate issue for MvNormalWishart #148. For all other distributions: the approach is working. |
I have such example for evaluating the multivariate normal log probability density function (logpdf):
The results:
Clearly, the result from the ExponentialFamily.ExponentialFamilyDistribution comes from re-evaluating the log partition for each sample. However, should this be avoided? But even just ExponentialFamily.MvNormalMeanCovariance is considerably slower.
The text was updated successfully, but these errors were encountered: