Skip to content

Installing SolrCloud

Gil Hoggarth edited this page Jan 10, 2015 · 6 revisions

Pre-requisites

Java - we use latest Oracle Java 7 JDK

http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

We use Jetty which comes with Solr. In the past we've used Apache Tomcat, mainly because many of our other services (wayback service, access service) use Tomcat. However, Tomcat provides another service layer which means another layer of potential problem cause. As Jetty comes with Solr and is pre-configured, it's easier to just use Jetty. Plus, if anything, in our experience it's a bit quicker than Tomcat.

Individual CPU core and disk per shard

For us, the large size of our dataset means that I/O speed is a major performance factor. Dedicating a CPU core and a hard disk per shard is a simple way to get an efficient setup. Alternatively, storage can be provided by a Storage Area Network (SAN), which can be both fast and resilient, or even by SSDs (Solid State Drives).

Memory is a critical issue

The amount of RAM available per Solr shard and available for the Operating System is critical. We generally choose 5GB Xmx (and Xms) with 24 shards = 120GB RAM. Even at 256GB RAM our servers run with ~0GB free (because the OS uses up the rest for disk I/O cache.) A choice of solr.MMapDirectoryFactory 'DirectoryFactory' has a significant impact on the amount of memory available - see http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for detailed information.

Also, as recently commented upon on the Solr mailing list (details at http://lucene.apache.org/solr/resources.html#community), on *nix systems it is best not to employ Transparent Huge Pages (THP). Information about how to disable Huge Pages can be found at http://oracle-base.com/articles/linux/configuring-huge-pages-for-oracle-on-linux-64.php. It should also be noted that some services, including tuned and ktune enable Huge Pages so care must be taken to not enable THP after disabling it via the boot configuration.

Solr Installation

The installation of Solr is fairly simple:

  • Download a suitable distribution from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html and uncompress to your installation of choice. As the British Library uses RHEL we install in /opt/ - /opt in RHEL is untouched by service updates so this directory is safe from accidental misconfiguration.
  • To simplify the installation process, rename the /opt/solr-(version) directory to just /opt/solr/

Server changes

  • Create solr user if not already existing
  • Due to our large data volumes, we up the number of process and file limits (both hard and soft) for the solr user (ph 6144, ps 4096, fh 65536, fs 49152)
  • Open appropriate firewall ports
  • Create solr pid directory, writable by solr

Zookeepers

We use three zookeeper VMs dedicated to each Solr index. Over time we've learnt that our network topology can have some effect to the zookeeper settings, but this is very network specific - it's worth reducing the default settings down to the lowest settings and observing the affect, which might help the recognition of future network caused issues.

Solr Configuration

As we use RHEL our services follow the typical RHEL /etc/sysconfig/ and /etc/init.d/ deployment approach (though this is entirely up to you/your environment.)

Inside /etc/sysconfig/solr various environment variables are set:

  • node name
  • solr collection name
  • number of shards
  • Jetty settings: home, logs, pid, user, port
  • Java START_OPTS
  • zookeeper name:port,name:port etc
  • zookeeper timeout (The only value that changes per shard is the node name. Plus, it is preferable to not use the same storage device for the Solr data and for the service log, to eliminate contention of I/O.)

/opt/solr/example/solr/collection/conf/schema.xml - add your particular fields of reference. Strongly suggest all unnecessary fields are removed.

/opt/solr/example/solr/collection/conf/solrconfig.xml:

  • configure data directory
  • configure directoryFactory
  • configure autocommit and opensearcher

Duplicate Node

A significant benefit of installing into the directory /opt/solr/ and making configuration amendments is that this installation can then be used as the master from which actual solr nodes (to meet number of shards) can be copied.

Generally the details that need amending per node are:

  • node name
  • port
  • log directory
  • zookeeper settings
  • For first service: -DzkRun
  • For rest: -DzkHost:localhost:9983