From dc175fccd53d34eda5c224aeb7d529e3b7260f94 Mon Sep 17 00:00:00 2001 From: Kim Walisch Date: Mon, 5 Dec 2016 15:08:15 +0100 Subject: [PATCH] Add "How it works" section --- README.md | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index b8120c3..9fc6f62 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,9 @@ libpopcnt ```libpopcnt.h``` is a header only C/C++ library for counting the number of 1 bits (bit population count) in an array as quickly as -possible using specialized CPU instructions e.g. POPCNT, AVX2. +possible using specialized CPU instructions e.g. +[POPCNT](https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT), +[AVX2](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions). ```libpopcnt.h``` has been tested successfully using the GCC, Clang and MSVC compilers. @@ -14,6 +16,25 @@ The algorithms used in ```libpopcnt.h``` are described in the paper [Faster Population Counts using AVX2 Instructions](https://arxiv.org/abs/1611.07612) by Daniel Lemire, Nathan Kurz and Wojciech Mula (23 Nov 2016). +How it works +============ +```libpopcnt.h``` uses a combination of 3 different bit population +count algorithms based on the CPU architecture and the input array +size: + +* For array sizes < 1 kilobyte an unrolled ```POPCNT``` algorithm +is used. +* For array sizes ≥ 1 kilobyte an ```AVX2``` algorithm is used if +the CPU supports AVX2. +* For CPUs without ```POPCNT``` instruction a portable +integer popcount algorithm is used. + +The GitHub repository +[WojciechMula/sse-popcount](https://github.com/WojciechMula/sse-popcount/tree/master/results) +contains extensive benchmarks for the 3 algorithms used in +```libpopcnt.h```. The algorithm are named +```builtin-popcnt-unrolled```, ```avx2-harley-seal```, ```harley-seal```. + C/C++ API ========= ```C++ @@ -28,13 +49,13 @@ uint64_t popcnt(const void* data, uint64_t size); uint64_t popcnt_u64(uint64_t x); ``` -How to use it -============= +How to compile +============== At compile time you need to specify if your compiler supports the ```POPCNT``` & ```AVX2``` instructions. ```bash -# How to compile on x86 & x86_64 CPUs +# How to compile on x86_64 CPUs gcc -mpopcnt -DHAVE_POPCNT -mavx2 -DHAVE_AVX2 program.c # How to compile using Microsoft Visual C++