Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use standard C++ benchmarking library for atomspace microbenchmarks #6

Open
vsbogd opened this issue Apr 26, 2018 · 4 comments
Open

Comments

@vsbogd
Copy link
Contributor

vsbogd commented Apr 26, 2018

To follow up the @linas comment: here and to not reinvent the wheel I would propose using some ready C++ benchmarking library.

Requirements to such library:

  • ability to keep run results
  • ability to compare run results
  • showing info on system which is used to perform test
  • anti-optimization tricks

Google Benchmark (https://github.com/google/benchmark) seems to be a good candidate. Unfortunately there are no ready to install packages for benchmark libraries so it will be additional manual step in building procedure.

Other well known libraries:

Some review and comparision can be found here (full articles are here and here).

@linas
Copy link
Member

linas commented Apr 27, 2018

Note that the current benchmark measures C++, python and scheme performance. The C++ side is straight-forward, the python and scheme side, not so much. For scheme, there are three distinct bottlenecks:

  1. how fast can you move from C++ to guile, do nothing (no-op), and return to C++. Last I measured, this was about 15K/sec or 20K/sec for guile, and about 20K/sec to 25K/sec for cython/python.

  2. Once inside guile, how fast can you do something, e.g. create atoms in a loop, that loop being written in scheme (or python). Last I measured, this was reasonably fast, no complaints.

  3. How does 2) work when the scheme code is interpreted, memoized, or compiled. All three have different performance profiles. When the guile interpreter runs, nothing is computed in advance; it is interpreted "on the fly". When memoization is turned on, guile caches certain intermediate results, for faster re-use. When compiling is turned on, the scheme code is compiled into a byte-code, and then that byte-code is executed.

Historical experience is that compiling often looses: The amount of time that it takes to compile (ConceptNode "foo") into bytecode far exceeds the savings of a few cycles, compared to the interpreted version (because, hey, both the compiled and the interpreted paths immediately call C++ code, which is where the 98% of the cpu time goes).

Thus, reporting C++ performance is not bad, but the proper use and measurement of the c++/guile and the c++/python interfaces is .. tricky.

@linas
Copy link
Member

linas commented Apr 27, 2018

Note also: it is not entirely obvious that ripping out the existing benchmark code and replacing it with something else results in a win. I mean, starting and stopping a timer, and printing the result is just .. not that hard.

The biggest problem is that the existing benchmark code is just ... messy. There's a bunch of crap done to set up the atomspace, populate it with atoms. What, exactly is a "reasonable" or "realistic" set of atoms to stick in there? How does performance vary as a function of atomspace size?

Other parts of the messiness have to do with the difficulty of measuring the c++/guile interfaces. Its not at all clear to me that just using a different microbenchmarking tool will solve any of these problems....

That said, I don't really care if or how the benchmarks are redesigned, as long as the work and are accurate (and we get a chance to do before+after measurements, to verify that any new results are in agreement with the old results)

@linas
Copy link
Member

linas commented Apr 27, 2018

Note also: the current benchmark fails to control dynamic range. For example, we can call getArity() more than a million times a second. We perform a pattern search in about 100 times a second. The current benchmark wants to time how long it takes to do both, N times. Clearly, just one value of N cannot be used to measure both. This is one of the messy issues.

@vsbogd
Copy link
Contributor Author

vsbogd commented Apr 27, 2018

I tried to check python benchmarking but it seems to be broken - raised additional issue on it #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants