-
Notifications
You must be signed in to change notification settings - Fork 96
Squall Local Configs
We will explain the content of a config file on INSTALL_DIR/dip/SQLtoQueryPlanPlugin/confs/0.1G_hyracks_serial
:
DIP_DISTRIBUTED false
DIP_QUERY_NAME hyracks
DIP_TOPOLOGY_NAME_PREFIX teamX
DIP_DATA_ROOT /path/to/tpch/on/local/machine/
DIP_SQL_ROOT ../dip/SQLtoQueryPlanPlugin/SQLqueries/
# DIP_DB_SIZE is in GBs
DIP_DB_SIZE 0.1
DIP_MAX_SRC_PAR 1
# below are unlikely to change
DIP_EXTENSION .tbl
DIP_READ_SPLIT_DELIMITER \|
DIP_GLOBAL_ADD_DELIMITER |
DIP_GLOBAL_SPLIT_DELIMITER \|
DIP_ACK_EVERY_TUPLE true
DIP_KILL_AT_THE_END true
In order to distinguish parameters of Squall and Storm, we use prefix DIP
for Squall, which is a shortcut for Distributed Incremental Processing. DIP_DISTRIBUTED
must be false to execute the query plan in Local mode. DIP_QUERY_NAME
must correspond to a query from INSTALL_DIR/dip/SQLtoQueryPlanPlugin/SQLqueries/
. In this case, DIP_QUERY_NAME = hyracks
corresponds to a SQL query from INSTALL_DIR/dip/SQLtoQueryPlanPlugin/SQLqueries/hyracks.sql
. Topology name is built by concatenation of DIP_TOPOLOGY_NAME_PREFIX
and DIP_TOPOLOGY_NAME
.
DIP_TOPOLOGY_NAME_PREFIX
is there to distinguish different users, thus it can remain empty.
A database path is built by the concatenation of DIP_DATA_ROOT
, DIP_DB_SIZE
parameters and G
string. We needed DIP_DB_SIZE
separately because our optimizer uses this information for allocating parallelism for Storm components. The only way you can control parallelism is via DIP_MAX_SRC_PAR
. For
small relations (less than 100 tuples) the parallelism is 1, and for all others the
parallelism is set to DIP_MAX_SRC_PAR
. The parallelism for Bolts is set automatically, taking into account the position of a component in the query plan, such that there is no bottleneck with the minimal number of nodes used.
Due to main memory constraints, you cannot run arbitrary large database with small component parallelism. For information on detecting this behavior, please consult Squall query plans vs Storm topologies, section How to know we run out of memory?. The way you control it is through MAX_SRC_PAR parameter
- the larger the
parameter is, bigger database can be processed.
DIP_SQL_ROOT
is the absolute path for SQL queries on your local machine. DIP_ACK_EVERY_TUPLE
refers to a way we ensure that the processing is done, so the final result and the full execution time can be acquired. If the parameter is set to true, that means we ack each and every tuple. If the parameter is set to false, each Spout sends a special message as the last tuple. For more information about implications of this parameter, please consult Squall query plans vs Storm topologies, section To ack or not to ack?.
Now we explain the parameters you most likely would not need to change;
DIP_EXTENSION
refers to file extension in your database. In our case, the names
of the database files were customer.tbl
, orders.tbl
, etc.
DIP_READ_SPLIT_DELIMITER
is a regular expression used for delimiting columns
of a tuple in a database file. DIP_GLOBAL_ADD_DELIMITER
and DIP_GLOBAL_SPLIT_DELIMITER
are used in
Squall internally for serializing and deserializing tuples between different components. DIP_KILL_AT_THE_END
assures your topology is killed after the final
result is written to a file. If you set this to false, your topology will execute
forever, consuming resources that could be used by other topologies executing
at the same time.
Thus, in order to change database size, we have to modify DIP_DB_SIZE
parameter, and for changing the query we have to change DIP_QUERY_NAME
. You can find more examples of config files in INSTALL_DIR/dip/SQLtoQueryPlanPlugin/confs/
, but for Local Mode only those ending with _serial
are applicable. You can also write config files from scratch, but make sure you put them in INSTALL_DIR/dip/SQLtoQueryPlanPlugin/confs/
. To run a config file MY_CONFIG from this directory, you have to run:
cd $INSTALL_DIR/bin
./pluginSQLLocalRun.sh MY_CONFIG
Keep in mind that in each config file you want to use you have to set DIP_DATA_ROOT
parameter.