Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hand-unroll the SIMD dot product loop #380

Merged
merged 5 commits into from
Dec 31, 2024
Merged

Conversation

blambov
Copy link
Contributor

@blambov blambov commented Dec 30, 2024

This removes some data dependencies and permit much greater internal parallelism for the code.

On my machine this reduces the single-threaded dot product time for 1024 floats by one third, from 111 ns to 74 ns.

@jbellis
Copy link
Owner

jbellis commented Dec 31, 2024

In Bench and the SimilarityBench it's mostly a wash for me, but I'm happy to commit based on your results.

Bench results (omitting warmup on colbert-1M). The difference in build and search times is within the normal run-to-run variance.

Before:

ada002-1M: 983064 base and 9753 query vectors created, dimensions 1536
Build (full res) M=32 ef=100 in 136.62s with avg degree 31.99 and 0.65 short edges
Wrote [INLINE_VECTORS] in 8.90s
Uncompressed vectors
Using CachingGraphIndex(graph=OnDiskGraphIndex(size=983064, entryPoint=91031, features=INLINE_VECTORS)):
 Query top 99/99 recall 0.7468 in 2.08s after 36,863,426 nodes visited

After:

ada002-1M: 983064 base and 9753 query vectors created, dimensions 1536
Build (full res) M=32 ef=100 in 137.31s with avg degree 31.99 and 0.65 short edges
Wrote [INLINE_VECTORS] in 8.45s
Uncompressed vectors
Using CachingGraphIndex(graph=OnDiskGraphIndex(size=983064, entryPoint=91031, features=INLINE_VECTORS)):
 Query top 99/99 recall 0.7475 in 2.05s after 36,608,860 nodes visite

SimilarityBench:

Before:

Iteration   1: 132534573.353 ops/s
Iteration   2: 131339572.110 ops/s
Iteration   3: 137510039.044 ops/s

Iteration   1: 50.220 ns/op
Iteration   2: 50.490 ns/op
Iteration   3: 50.324 ns/op

After:

Iteration   1: 140738259.475 ops/s
Iteration   2: 124628756.103 ops/s
Iteration   3: 126568242.550 ops/s

Iteration   1: 53.262 ns/op
Iteration   2: 53.074 ns/op
Iteration   3: 53.379 ns/op

@jbellis jbellis merged commit 7b78c9e into jbellis:main Dec 31, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants