Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance on recursive tuple/list deconstruction #394

Open
vsbogd opened this issue Aug 11, 2023 · 5 comments
Open

Bad performance on recursive tuple/list deconstruction #394

vsbogd opened this issue Aug 11, 2023 · 5 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request optimization Opportunity to optimize performance

Comments

@vsbogd
Copy link
Collaborator

vsbogd commented Aug 11, 2023

This issue is to investigate the perfomance in case when list or tuple is recursively deconstructed.

According to @patham9 measurements:

(= (TupleCount $tuple) (if (== $tuple ()) 0 (+ 1 (TupleCount (cdr-atom $tuple)))))
!(TupleCount (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30))

takes 2.8 seconds

(= (ListCount Nil) 0)
(= (ListCount (Cons $head $tail)) (+ 1 (ListCount $tail)))
!(ListCount (Cons 1 (Cons 2 (Cons 3 (Cons 4 (Cons 5 (Cons 6 (Cons 7 (Cons 8 (Cons 9 (Cons 10 (Cons 11 (Cons 12 (Cons 13 (Cons 14 (Cons 15 (Cons 16 (Cons 17 (Cons 18 (Cons 19 (Cons 20 (Cons 21 (Cons 22 (Cons 23 (Cons 24 (Cons 25 (Cons 26 (Cons 27 (Cons 28 (Cons 29 (Cons 30 Nil)))))))))))))))))))))))))))))))

takes 3 seconds

(= (BuildList $w $c) (if (> $c 0) (BuildList (Cons 1 $w) (- $c 1)) $w))
!(BuildList Nil 30)

and this 4.8 seconds for me.
Same with tuples 3.8 seconds:

(= (BuildTuple $w $c) (if (> $c 0) (BuildTuple (1 $w) (- $c 1)) $w))
!(BuildTuple Nil 30)
@vsbogd vsbogd added bug Something isn't working enhancement New feature or request labels Aug 11, 2023
@patham9
Copy link

patham9 commented Aug 22, 2023

Workaround for one of the probably most critical cases, counting of elements in a collapse: https://gist.github.com/patham9/aa3e93bb369e3fc1660cb473e7e74b55

@vsbogd vsbogd self-assigned this Aug 22, 2023
@luketpeterson
Copy link
Contributor

This isn’t fundamental to the issue at hand, and you may already be doing this, but…

Building hyperon with cmake -DCMAKE_BUILD_TYPE=Release ../ as opposed to just cmake ../ provides roughly a 50% boost to this test on my setup.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Aug 24, 2023

I have looked at the speed of this example in minimal MeTTa interpreter and there the recursive function call is transformed to the deeply nested atom which contains the interpretation plan. The number of nested levels is a linear function from recursion depth. When interpreter evaluates this atom it unwinds the nested atoms on each step to extract the leaf atom and evaluate it. Thus the overall number of atoms to be deconstructed/constructed during evaluation is a sum of arithmetic progression and it grows quadratically when number of nested levels grows linearly. I believe something similar happens in Rust interpreter with nested plans but I am not sure.

@vsbogd
Copy link
Collaborator Author

vsbogd commented Aug 24, 2023

I don't think we need to fix it in Rust interpreter but we should fix it in minimal MeTTa interpreter. One way to do it is to eliminate construction/deconstruction of atoms and instead replace the leaf atom during evaluation. Unfortunately it is doesn't work for the case when leaf atom is evaluated into few alternatives. In such case we need to copy the overall plan which is costly.

Also we could change the representation of the plan to simplify getting currently evaluated atom and reuse the further plan instead of copying it when atom is evaluated into list of alternatives.

@vsbogd vsbogd added the optimization Opportunity to optimize performance label Aug 16, 2024
@vsbogd
Copy link
Collaborator Author

vsbogd commented Oct 11, 2024

Performance measurements on current main (1c4bc52) on AMD Ryzen 7 5800U. Both Rust and Python versions was build in release mode. 30 is a original test from issue description, 120 the same test with 4 times more data. One exception is a ListCount test which runs even slower than in description thus I used lists with 8 and 16 elements to measure performance. The reason is that ListCount performance significantly degraded at some point.

Test 30 (Rust) 120 (Rust) 30 (Python) 120 (Python)
TupleCount 0.067 0.520 0.311 4.729
ListCount 0.025 (8 items) 2.942 (16 items) 0.072 (8 items) 4.297 (16 items)
BuildList 0.120 2.695 0.474 14.128
BuildTuple 0.082 1.415 0.336 9.126

I additionally tested Rust version on 69a698f which is state of the code before #745 is merged. Here ListCount was tested on 30 and 120 items in list which corresponds to x1 and x4 data of the test in the description. It can be seen that it doesn't demonstrate any performance degradation comparing with measurements in issue description. Thus we can conclude that new type checking logic introduced in #745 significantly affected performance of ListCount function.

Test 30 (Rust) 120 (Rust)
TupleCount 0.056 0.446
ListCount 0.078 2.098
BuildList 0.104 2.360
BuildTuple 0.076 1.374

Overall these measures show that performance is better comparing to the time when the issue was raised except ListCount which suffers from new type-checking logic implementation (#745). At the same time it can be seen that time grows non linearly with the grow of the number of iterations which is still the issue. Also it can be seen that Python is approximately 5 times slower than Rust implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request optimization Opportunity to optimize performance
Projects
None yet
Development

No branches or pull requests

3 participants