This repository contains scripts to test various TPC-CH queries across AsterixDB, MongoDB, and Couchbase. The goal of this repository is to evaluate how well each system utilizes their multi-valued indexes for use in real-world analytical queries.
All experiments have been executed on AWS EC2 instances. Each node was of type c5.xlarge
(4 CPUs @ 3.4GHz, 8GB of RAM) with 3000 IOPS SSDs attached. Each node was running Ubuntu Server 20.04 LTS. We record the client response time here, so each experiment executed local to each database server (to minimize client-server communication latencies).
A modified TPC-CH was utilized here, one that a) more naturally represents orderlines within orders as nested documents, and b) has orderline dates that are uniformly distributed across 7 years. The scale used for the TPC-CH generator was numWarehouses=500
.
- Install
python3.8
, andpython3-pip
. - Clone this repository onto the server node. Ensure you have the correct requirements.
# Clone the repository.
git clone https://github.com/glennga/aconitum.git
# Install the requirements (a virtual environment also works).
cd aconitum
python3 -m pip install -r requirements.txt
- Ensure that your
python3
searches theaconitum
repository for modules. Add the following line to your.bashrc
file.
export PYTHONPATH=$PYTHONPATH:/home/ubuntu/aconitum
- Ensure that AsterixDB is installed with
java 11
and configured on the node to run the experiments on. The cc.conf file used is as follows:
[nc/asterix_nc1]
txn.log.dir=txnlog
iodevices=iodevice
core.dump.dir=coredump
nc.api.port=19004
address=localhost
[nc]
command=asterixnc
[cc]
address=localhost
[common]
log.dir=logs/
log.level=INFO
- Create the dataverse required for this experiment.
DROP DATAVERSE TPC_CH IF EXISTS;
CREATE DATAVERSE TPC_CH;
USE TPC_CH;
CREATE TYPE CustomerType AS { c_w_id: bigint, c_d_id: bigint, c_id: bigint };
CREATE TYPE NationType AS { n_nationkey: bigint };
CREATE TYPE OrdersType AS { o_w_id: bigint, o_d_id: bigint, o_id: bigint };
CREATE TYPE StockType AS { s_w_id: bigint, s_i_id: bigint };
CREATE TYPE ItemType AS { i_id: bigint };
CREATE TYPE RegionType AS { r_regionkey: bigint };
CREATE TYPE SupplierType AS { su_suppkey: bigint };
CREATE DATASET Customer (CustomerType) PRIMARY KEY c_w_id, c_d_id, c_id;
CREATE DATASET Nation (NationType) PRIMARY KEY n_nationkey;
CREATE DATASET Orders (OrdersType) PRIMARY KEY o_w_id, o_d_id, o_id;
CREATE DATASET Stock (StockType) PRIMARY KEY s_w_id, s_i_id;
CREATE DATASET Item (ItemType) PRIMARY KEY i_id;
CREATE DATASET Region (RegionType) PRIMARY KEY r_regionkey;
CREATE DATASET Supplier (SupplierType) PRIMARY KEY su_suppkey;
- Load each dataset in the dataverse. Adjust the path accordingly.
LOAD DATASET TPC_CH.Customer USING localfs (
("path"="localhost:///home/ubuntu/aconitum/resources/tpc_ch/customer.json"),
("format"="json")
);
LOAD DATASET TPC_CH.Nation USING localfs (
("path"="localhost:///home/ubuntu/aconitum/resources/tpc_ch/nation.json"),
("format"="json")
);
LOAD DATASET TPC_CH.Orders USING localfs (
("path"="localhost:///home/ubuntu/aconitum/resources/tpc_ch/orders.json"),
("format"="json")
);
LOAD DATASET TPC_CH.Stock USING localfs (
("path"="localhost:///home/ubuntu/aconitum/resources/tpc_ch/stock.json"),
("format"="json")
);
LOAD DATASET TPC_CH.Item USING localfs (
("path"="localhost:///home/ubuntu/aconitum/resources/tpc_ch/item.json"),
("format"="json")
);
LOAD DATASET TPC_CH.Region USING localfs (
("path"="localhost:///home/ubuntu/aconitum/resources/tpc_ch/region.json"),
("format"="json")
);
LOAD DATASET TPC_CH.Supplier USING localfs (
("path"="localhost:///home/ubuntu/aconitum/resources/tpc_ch/supplier.json"),
("format"="json")
);
- Build our secondary indexes. This is a separate step to speed-up the previous loading phase.
USE TPC_CH;
CREATE INDEX orderlineDelivDateIdx
ON Orders (
UNNEST o_orderline
SELECT ol_delivery_d : string
) EXCLUDE UNKNOWN KEY;
CREATE INDEX orderlineItemIdx
ON Orders (
UNNEST o_orderline
SELECT ol_i_id : bigint
) EXCLUDE UNKNOWN KEY;
- Execute the benchmark query suite for AsterixDB.
python3 aconitum/_asterixdb.py
- Analyze the results! The results will be stored in the
out
folder underresults.json
as JSONL documents.
- Ensure that Couchbase 7.0 is installed and configured on the node to run the experiments on. Docs on the install can be found here. Use the deb package instructions to get the latest and greatest. The memory for the Data service should be about 1GB (1024 MB). The only other service that is made available is the Index service, and this should be allocated the remainder of the possible memory (3GB, or 3072 MB). The
Max Parallelism
parameter should be set to the number of cores of your system (in our case, 4). - Create the bucket to hold all of your data. This should utilize the entire cluster's memory-data quota (1024 MB in this case). Change the bucket's ejection-method policy from Value-only to Full.
- Create the collections. The default scope of the bucket houses each collection.
CREATE COLLECTION aconitum._default.Customer;
CREATE COLLECTION aconitum._default.Nation;
CREATE COLLECTION aconitum._default.Orders;
CREATE COLLECTION aconitum._default.Stock;
CREATE COLLECTION aconitum._default.Item;
CREATE COLLECTION aconitum._default.Region;
CREATE COLLECTION aconitum._default.Supplier;
- Load each collection in the bucket. Adjust the path accordingly.
for c in customer nation orders stock item region supplier; do
/opt/couchbase/bin/cbimport json \
--cluster localhost --username "admin" --password "password" \
--bucket "aconitum" --scope-collection-exp _default.${c^} \
--dataset file:///home/ubuntu/aconitum/resources/tpc_ch/$c.json \
--format lines --generate-key key::#UUID#
done
- Create an external UDF to convert strings to codepoints.
curl -v -X POST \
http://localhost:8093/evaluator/v1/libraries/codepoint-js \
-u admin:password \
-H 'content-type: application/json' \
-d 'function stringToCodepoint (inputString) { return inputString.codePointAt(0); }'
- Load the external function, then build the indexes associated with each collection. We were unable to load all the indexes (without trouble) without (a) first upgrading the node type, (b) increasing the indexer allocated memory, (c) running the statements below, and (d) downgrading the nodes to the appropriate size.
CREATE FUNCTION stringToCodepoint(inputString)
LANGUAGE JAVASCRIPT AS "stringToCodepoint" AT "codepoint-js";
CREATE INDEX customerPrimaryKeyIdx
ON aconitum._default.Customer ( c_w_id, c_d_id, c_id );
CREATE INDEX nationPrimaryKeyIdx
ON aconitum._default.Nation ( n_nationkey );
CREATE INDEX ordersPrimaryKeyIdx
ON aconitum._default.Orders ( o_w_id, o_d_id, o_id );
CREATE INDEX stockPrimaryKeyIdx
ON aconitum._default.Stock ( s_w_id, s_i_id );
CREATE INDEX itemPrimaryKeyIdx
ON aconitum._default.Item ( i_id );
CREATE INDEX regionPrimaryKeyIdx
ON aconitum._default.Region ( r_regionkey );
CREATE INDEX supplierPrimaryKeyIdx
ON aconitum._default.Supplier ( su_suppkey );
CREATE INDEX orderlineDelivDateIdx
ON aconitum._default.Orders (
DISTINCT ARRAY OL.ol_delivery_d
FOR OL
IN o_orderline
END
);
CREATE INDEX orderlineItemIdx
ON aconitum._default.Orders (
DISTINCT ARRAY OL.ol_i_id
FOR OL
IN o_orderline
END
);
CREATE PRIMARY INDEX customersPrimaryIdx
ON aconitum._default.Customer;
CREATE PRIMARY INDEX nationPrimaryIdx
ON aconitum._default.Nation;
CREATE PRIMARY INDEX ordersPrimaryIdx
ON aconitum._default.Orders;
CREATE PRIMARY INDEX stockPrimaryIdx
ON aconitum._default.Stock;
CREATE PRIMARY INDEX itemPrimaryIdx
ON aconitum._default.Item;
CREATE PRIMARY INDEX regionPrimaryIdx
ON aconitum._default.Region;
CREATE PRIMARY INDEX supplierPrimaryIdx
ON aconitum._default.Supplier;
- Execute the benchmark query suite for Couchbase.
python3 aconitum/_couchbase.py
- Analyze the results! The results will be stored in the
out
folder underresults.json
as JSONL documents.
- Ensure that MongoDB is installed and configured on the node to run the experiments on. Docs on the install can be found here. Enable access control for a user.
use admin
db.createUser ({
user: "admin",
pwd: "password",
roles: [ "root" ]
})
- Create the database and collections required for this experiment.
use aconitum
db.createCollection ( "Customer" )
db.createCollection ( "Nation" )
db.createCollection ( "Orders" )
db.createCollection ( "Stock" )
db.createCollection ( "Item" )
db.createCollection ( "Region" )
db.createCollection ( "Supplier" )
- Load each collection in the database. Adjust the path accordingly.
for c in customer nation orders stock item region supplier; do
mongoimport \
--authenticationDatabase admin \
--db aconitum \
--collection ${c^} \
--host localhost:27017 \
--username admin \
--password password \
--drop \
/home/ubuntu/aconitum/resources/tpc_ch/$c.json
done
- Create the indexes for this experiment.
use aconitum
db.Customer.createIndex (
{ "c_w_id": 1, "c_d_id": 1, "c_id": 1 },
{ name: "customerPrimaryKeyIdx" }
)
db.Nation.createIndex (
{ "n_nationkey": 1 },
{ name: "nationPrimaryKeyIdx" }
)
db.Orders.createIndex (
{ "o_w_id": 1, "o_d_id": 1, "o_id": 1 },
{ name: "ordersPrimaryKeyIdx" }
)
db.Stock.createIndex (
{ "s_w_id": 1, "s_i_id": 1 },
{ name: "stockPrimaryKeyIdx" }
)
db.Item.createIndex (
{ "i_id": 1 },
{ name: "itemPrimaryKeyIdx" }
)
db.Region.createIndex (
{ "r_regionkey": 1 },
{ name: "regionPrimaryKeyIdx" }
)
db.Supplier.createIndex (
{ "su_suppkey": 1 },
{ name: "supplierPrimaryKeyIdx" }
)
db.Orders.createIndex (
{ "o_orderline.ol_delivery_d": 1 },
{ name: "orderlineDelivDateIdx" }
)
db.Orders.createIndex (
{ "o_orderline.ol_i_id": 1 },
{ name: "orderlineItemIdx" }
)
- Execute the benchmark query suite for MongoDB.
python3 aconitum/_mongodb.py
- Analyze the results! The results will be stored in the
out
folder underresults.json
as single line JSON documents.