Skip to content

Quick Start: Cluster Mode

Aleksandar Vitorovic edited this page Mar 27, 2016 · 38 revisions

This section is a tutorial for running Squall/Storm on a cluster. It assumes that Squall works in Local Mode, that you have set up a Storm cluster and that the storm client is working. This means that you have set up your storm.yaml correctly.

It is recommended that you first try this with a [virtual cluster (https://github.com/epfldata/squall/wiki/Setting-up-a-virtual-cluster-using-Wirbelsturm).

Here is a list of steps to follow:

  1. Inside squall-$VERSION/test/squall/confs/cluster/1G_hyracks you have to set up DIP_DATA_ROOT such that reflects the path to your TPC-H databases on the cluster. If you want, you can also set DIP_TOPOLOGY_NAME_PREFIX, but this is not mandatory. It is used for distinguishing different users possibly running the same query at the same time on the cluster. In order to run Squall with a different query and/or different database size, please consult Squall Cluster configs.

  2. squall_cluster.sh submits squall-core/target/squall-0.2.0.jar using the storm client. You only need to pass a Squall configuration as an argument:

  $ cd bin
  $ ./squall_cluster.sh ../test/squall/confs/cluster/1G_hyracks

If you want to recompile the code before submitting it, please consult Recompilation.

It is also possible to get the Storm topology directly from the QueryBuilder or to submit the plan by using the SquallContext helper functions.

This time you will not obtain the result at command line, rather you will get your command prompt back as soon as the topology is submitted to the cluster. You can monitor the execution of your topology at http://STORM_UI_SERVER:8080. Here you can find various information such as information about active topologies and the number of tuples sent between Spouts and Bolts.

If something doesn't work as expected, check the troubleshooting section or open an issue.

Your topology will be automatically killed after the final result is produced, in order to free the resources for other topologies. You can also kill it explicitly:

    storm kill myTopologyName