Skip to content

Add two implementations split10 and split11 #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rep-movsd
Copy link

split10

This is a simple implementation that uses std::find()

It uses a for loop with two variables:
itStart - iterator start of a word
itDelimiter - iterator to a delimiter

We get the result of find() for the next delimiter in itDelim and then
save the range (itStart, itDelim) as a token, and emplace the string into
the result vector

This very simple 5 liner shows that good C++ code can beat C code without
any trouble


split11

Almost the same as split10 but we use a StringRef class just like the
one in split6. Instead of iterators we use a char pointer
This speeds it up by almost a factor of 80%

We get pretty close to the subparser version, only about 15% slower on my
system.


Benchmark results (trimmed output for 80 columns)

$ ./run_all.bash
=== System info
Arch rolling
Linux 4.5.5-1-ck x86_64 GNU/Linux
Intel(R) Core(TM) CPU X 920 @ 2.00GHz
g++ (GCC) 6.1.1 20160501
Python 3.5.1
=== End System info

./split.py Python: 38.9 seconds. Crunch Speed: 514293.2
./split5.py Python: 41.0 seconds. Crunch Speed: 488168.6
./split1 C++ : 8.7 seconds. Crunch speed: 2288344.6
./split2 C++ : 20.9 seconds. Crunch speed: 958164.2
./split6 C++ : 3.6 seconds. Crunch speed: 5603155.4
./split7 C++ : 2.6 seconds. Crunch speed: 7750547.7
./split8 C++ : 31.0 seconds. Crunch speed: 644411.0
./split9 C++ : 21.1 seconds. Crunch speed: 949104.7
./split10 C++ : 3.7 seconds. Crunch speed: 5387448.0
./split11 C++ : 2.3 seconds. Crunch speed: 8703679.3
./split_subparser C++ : 2.0 seconds. Crunch speed: 9956735.4
./splitc1 C++ : 7.9 seconds. Crunch speed: 2519434.9
./splitc2 C++ : 8.0 seconds. Crunch speed: 2484935.2
./splitc3 C++ : 8.0 seconds. Crunch speed: 2515293.7
$

rep-movsd added 3 commits May 29, 2016 16:19
==================================================

split10
---------
This is a simple implementation that uses std::find()

It uses a for loop with two variables:
itStart - iterator start of a word
itDelimiter - iterator to a delimiter

We get the result of find() for the next delimiter in itDelim and then
save the range (itStart, itDelim) as a token, and emplace the string into
the result vector

This very simple 5 liner shows that good C++ code can beat C code without
any trouble
--------------------------------------------------------------------------


split11
----------
Almost the same as split10 but we use a StringRef class just like the
one in split6. Instead of iterators we use a char pointer
This speeds it up by almost a factor of 80%

We get pretty close to the subparser version, only about 15% slower on my
system.
--------------------------------------------------------------------------


Benchmark results (trimmed output for 80 columns)
-------------------------------------------------

$ ./run_all.bash
=== System info
Arch rolling
Linux 4.5.5-1-ck x86_64 GNU/Linux
Intel(R) Core(TM) CPU          X 920  @ 2.00GHz
g++ (GCC) 6.1.1 20160501
Python 3.5.1
=== End System info

./split.py         Python: 38.9 seconds.  Crunch Speed: 514293.2
./split5.py        Python: 41.0 seconds.  Crunch Speed: 488168.6
./split1           C++   : 8.7 seconds.  Crunch speed: 2288344.6
./split2           C++   : 20.9 seconds.  Crunch speed: 958164.2
./split6           C++   : 3.6 seconds.  Crunch speed: 5603155.4
./split7           C++   : 2.6 seconds.  Crunch speed: 7750547.7
./split8           C++   : 31.0 seconds.  Crunch speed: 644411.0
./split9           C++   : 21.1 seconds.  Crunch speed: 949104.7
./split10          C++   : 3.7 seconds.  Crunch speed: 5387448.0
./split11          C++   : 2.3 seconds.  Crunch speed: 8703679.3
./split_subparser  C++   : 2.0 seconds.  Crunch speed: 9956735.4
./splitc1          C++   : 7.9 seconds.  Crunch speed: 2519434.9
./splitc2          C++   : 8.0 seconds.  Crunch speed: 2484935.2
./splitc3          C++   : 8.0 seconds.  Crunch speed: 2515293.7
$
Same as split11 except use memchr()
Now this is as fast or faster than split_subparser
-march=SSE2  and so on might be even faster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant