Skip to content

Benchmarking Cassandra and other NoSQL databases with YCSB

Tzach Livyatan edited this page Jul 16, 2014 · 12 revisions

YCSB is a popular benchmark tool for NoSQL. It have ready adapters for different NoSQL DB like Cassandra, Mongo, Redis and others.

YCSB comes with 6 out of the box workload, each testing a different common use case

  • Workload A: Update heavy workload This workload has a mix of 50/50 reads and writes. An application example is a session store recording recent actions.
  • Workload B: Read mostly workload This workload has a 95/5 reads/write mix. Application example: photo tagging; add a tag is an update, but most operations are to read tags.
  • Workload C: Read only This workload is 100% read. Application example: user profile cache, where profiles are constructed elsewhere (e.g., Hadoop).
  • Workload D: Read latest workload In this workload, new records are inserted, and the most recently inserted records are the most popular. Application example: user status updates; people want to read the latest.
  • Workload E: Short ranges In this workload, short ranges of records are queried, instead of individual records. Application example: threaded conversations, where each scan is for the posts in a given thread (assumed to be clustered by thread id).
  • Workload F: Read-modify-write

The following are instruction for using YCSB to test Cassandra on OSv.

Prerequisite

You will need both cassandra-cli and YCSB installed to test Cassandra

cassandra-cli

sudo yum install cassandra12

if the repository is not available sudo vi /etc/yum.repos.d/datastax.repo

insert the following:

[datastax]
name = DataStax Repo for Apache Cassandra
baseurl = http://rpm.datastax.com/community
enabled = 1
gpgcheck = 0

and try sudo yum install cassandra12 again

YCSB

install

Test Bed

Running Cassandra

running Cassandra on OSv capstan run cloudius/osv-cassandra -n bridge

Configure Cassandra

YCSB do not have a function to configure each NoSQL DB Prerequisites.

Create setup-ycsb.cql file like that:

create keyspace usertable with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor:1};
use usertable;
create column family data;

Execute: cassandra-cli -h 192.168.122.89 -f setup-ycsb.cql (use proper ip of your Cassandra server, 192.168.122.89 is the default IP for KVM with network bridge)

Benchmark

Each benchmark have two phases:

  1. upload the test data to DB
  2. execute transactions

Loading test data

./bin/ycsb load cassandra-10 -p hosts="192.168.122.89" -P workloads/workloada -s > workloada_load_res.txt

Running the benchmark

 ./bin/ycsb run cassandra-10 -threads 10 -p hosts="192.168.122.89" -P workloads/workloada -s  > workloada_run_res.txt

The results will be available at workloada_run_res.txt:

$ head workloada_res.txtYCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.CassandraClient10 -p hosts=192.168.122.89 -P workloads/workloadc -s -threads 10 -target 100 -t
[OVERALL], RunTime(ms), 10139.0
[OVERALL], Throughput(ops/sec), 98.62905611993293
[READ], Operations, 1000
[READ], AverageLatency(us), 1003.36
[READ], MinLatency(us), 412
[READ], MaxLatency(us), 25990
[READ], 95thPercentileLatency(ms), 1
[READ], 99thPercentileLatency(ms), 3

Longer test

TO run a longer test, you can override the workload file properties:

./bin/ycsb load cassandra-10 -p hosts="192.168.122.89" -threads 128 -p fieldcount=20 -p operationcount=900000 -p recordcount=15000000 -p requestdistribution=zipfian -P workloads/workloada -s > workloada_load_res.txt

./bin/ycsb run cassandra-10 -p hosts="192.168.122.89" -threads 128 -p fieldcount=20 -p operationcount=900000 -p recordcount=15000000 -p requestdistribution=zipfian -P workloads/workloada -s  > workloada_run_res.txt

Parameters above are base one DataStax benchmark

Troubleshooting

  • If you see the following errors
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

You are missing a few jars. Download the latest slf and copy slf4j-simple-1.7.7.jar and slf4j-api-1.7.7.jar to the ycsb dir

  • Running Cassandra on Linux When installing and running Cassandra on linux, make sure to update /etc/cassandra/default.conf/cassandra.yaml from rpc_address: localhost to rpc_address: 0.0.0.0

without this change Cassandra will not be available from remote.

More resources:

Clone this wiki locally