Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hashes instead of gensym id in PolyVar #41

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

saschatimme
Copy link
Contributor

This addresses #20 coming from the recent observation that in distributed code (with Distributed) gensym doesn't guarantee unique variable ids anymore.

I used the following script to benchmark:

using DynamicPolynomials
using BenchmarkTools
@polyvar x[1:3] y[1:10]
l_x = sum(x)
n = 10
println("sum(x[1:3])^10:")
display(@benchmark $l_x^$n)

l_y = sum(y)
n = 5
println("\nsum(y[1:10])^5:")
display(@benchmark $l_y^$n samples=5 evals=1)

f = l_y^4
p = randn(10)
val = y => p
println("\n(sum(y[1:10])^4)(p):")
display(@benchmark $f($val))

The results for one run of the current master are:

sum(x[1:3])^10:
BenchmarkTools.Trial: 
  memory estimate:  81.72 KiB
  allocs estimate:  608
  --------------
  minimum time:     180.079 μs (0.00% GC)
  median time:      182.609 μs (0.00% GC)
  mean time:        196.687 μs (4.36% GC)
  maximum time:     42.322 ms (99.39% GC)
  --------------
  samples:          10000
  evals/sample:     1
sum(y[1:10])^5:
BenchmarkTools.Trial: 
  memory estimate:  1.90 MiB
  allocs estimate:  10356
  --------------
  minimum time:     8.076 ms (0.00% GC)
  median time:      8.187 ms (0.00% GC)
  mean time:        8.190 ms (0.00% GC)
  maximum time:     8.342 ms (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1
(sum(y[1:10])^4)(p):
BenchmarkTools.Trial: 
  memory estimate:  208 bytes
  allocs estimate:  3
  --------------
  minimum time:     64.481 μs (0.00% GC)
  median time:      65.386 μs (0.00% GC)
  mean time:        66.445 μs (0.00% GC)
  maximum time:     162.281 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     

and for this PR:

sum(x[1:3])^10:
BenchmarkTools.Trial: 
  memory estimate:  81.72 KiB
  allocs estimate:  608
  --------------
  minimum time:     179.539 μs (0.00% GC)
  median time:      181.885 μs (0.00% GC)
  mean time:        194.135 μs (4.32% GC)
  maximum time:     41.636 ms (99.41% GC)
  --------------
  samples:          10000
  evals/sample:     1
sum(y[1:10])^5:
BenchmarkTools.Trial: 
  memory estimate:  1.90 MiB
  allocs estimate:  10356
  --------------
  minimum time:     8.035 ms (0.00% GC)
  median time:      8.104 ms (0.00% GC)
  mean time:        8.192 ms (0.00% GC)
  maximum time:     8.386 ms (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1
(sum(y[1:10])^4)(p):
BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  13
  --------------
  minimum time:     62.466 μs (0.00% GC)
  median time:      63.662 μs (0.00% GC)
  mean time:        66.810 μs (0.00% GC)
  maximum time:     268.350 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

But I have to note that these tests are super noisy, the same benchmark a couple minutes earlier (again the PR branch):

sum(x[1:3])^10:
BenchmarkTools.Trial: 
  memory estimate:  81.72 KiB
  allocs estimate:  608
  --------------
  minimum time:     180.279 μs (0.00% GC)
  median time:      183.395 μs (0.00% GC)
  mean time:        214.434 μs (4.30% GC)
  maximum time:     47.185 ms (99.26% GC)
  --------------
  samples:          10000
  evals/sample:     1
sum(y[1:10])^5:
BenchmarkTools.Trial: 
  memory estimate:  1.90 MiB
  allocs estimate:  10356
  --------------
  minimum time:     8.716 ms (0.00% GC)
  median time:      9.088 ms (0.00% GC)
  mean time:        9.029 ms (0.00% GC)
  maximum time:     9.309 ms (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1
(sum(y[1:10])^4)(p):
BenchmarkTools.Trial: 
  memory estimate:  368 bytes
  allocs estimate:  13
  --------------
  minimum time:     69.175 μs (0.00% GC)
  median time:      71.150 μs (0.00% GC)
  mean time:        85.484 μs (0.00% GC)
  maximum time:     1.067 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

My conclusion is that this does not introduce any significant performance impact and that we spend most our time in other code parts anyway.

The change would have the benefits that we are more consistent with regard to how variables are handled in the MultivariatePolynomials universe. Also I have to admit that the accidental redeclaration of variables resulted a couple of times in quite some head scratching.
Also this addresses the problems related to parallel code.

@saschatimme
Copy link
Contributor Author

Uh I completely ignored the order problem. Right now people create implicitly a variable order. Just replacing this by alphabetic order is probably not nice. This would also be super breaking and probably not desirable.

@chriscoey
Copy link

any recent thoughts on this?

@saschatimme
Copy link
Contributor Author

I am starting to more and more dislike the use of the gensym() id. This generates problems with multiple workers and make it also practical infeasible to store variables on disc (since then you load things and get easily a collision with the ids).
But I also think that there are only very breaking ways out of this... So it's not clear for me what (or even if) something should be done.

@chriscoey
Copy link

If the current way of doing things is not feasible long-term, better to rip off the bandaid sooner rather than later.

@blegat blegat reopened this May 9, 2024
@blegat
Copy link
Member

blegat commented May 9, 2024

We could make it work now with the variable order field

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants