-
Notifications
You must be signed in to change notification settings - Fork 104
Performance Considerations
Chun Kit LAM edited this page Aug 22, 2023
·
2 revisions
Manifold records the underlying CSG operations and evaluates them lazily. This has several consequences:
- CSG operations are instantaneous, because they don't really perform mesh operations. Benchmarks should also include the time required to run
GetMesh
or other operations that triggers evaluation.
const { sphere, union } = Manifold
const a = sphere(10, 128)
const b = sphere(10, 128).translate([10, 0, 0])
const start = performance.now()
const result = a.add(b)
const endAdd = performance.now()
result.getMesh()
const end = performance.now()
console.log(endAdd - start) // 0
console.log(end - endAdd) // 14
- For best performance, avoid calling functions that will trigger CSG evaluation unless it is really needed, e.g.
GetMesh
,NumVerts
. Manifold can perform optimization on a sequence of CSG operations, including reordering and parallelization. - Try to reuse existing results. In general, CSG operations are much slower than linear transformations (translation, rotation and scaling).
const { cylinder, sphere, union } = Manifold
function foo(deg) {
const r = 5
return sphere(1, 128).translate([r*Math.sin(deg / 180 * Math.PI), -r*Math.cos(deg/180*Math.PI), 0])
.add(sphere(1, 128).translate([-r*Math.sin(deg / 180 * Math.PI), r*Math.cos(deg/180*Math.PI), 0]))
.add(cylinder(2*r, 0.2, 0.2, 128, true).rotate([90, 0, deg]))
}
const foo_cached = foo(0)
function foo_faster(deg) {
return foo_cached.rotate([0, 0, deg])
}
let results = []
for (let i = 0; i < 180; i += 30)
results.push(foo_faster(i)) // foo_faster will be faster than foo
const result = union(results)
- Parallelization is costly, as sending messages between threads can be expensive. We only want to perform parallelization when there is a significant amount of work. In general, we use
autoPolicy(size)
to determine the policy: parallel when the size is large (the threshold is tunable), and sequential when the size is small. However, there is a caveat: this is only meant for simple tasks like copying data, sorting, prefix sum, where each operation takes only a few clock cycles. When the operations are significantly more complicated, you should run some benchmarks and determine the appropriate threshold (not a precise threshold, but a rough cutoff, as this can be machine and workload dependent). - Parallelization can be non-trivial. Parallel implementation of some algorithms may perform additional work to enable parallelization of the costly part. For example, our collider first count the number of collisions, allocate space for each query, and then fill the results. This means that we are actually finding the collisions twice. Benchmark should be performed to check if the overhead is worth it, and when should we switch to a faster sequential implementation.
- Contentions are costly, try to write in a way that avoids contention, for example by using several mutex and hash function.