-
Notifications
You must be signed in to change notification settings - Fork 25
Installing SolrCloud
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
We use Jetty which comes with Solr. In the past we've used Apache Tomcat, mainly because many of our other services (wayback service, access service) use Tomcat. However, Tomcat provides another service layer which means another layer of potential problem cause. As Jetty comes with Solr and is pre-configured, it's easier to just use Jetty. Plus, if anything, in our experience it's a bit quicker than Tomcat.
For us, the large size of our dataset means that I/O speed is a major performance factor. Dedicating a CPU core and a hard disk per shard is a simple way to get an efficient setup. Alternatively, storage can be provided by a Storage Area Network (SAN), which can be both fast and resilient, or even by SSDs (Solid State Drives).
The amount of RAM available per Solr shard and available for the Operating System is critical. We generally choose 5GB Xmx (and Xms) with 24 shards = 120GB RAM. Even at 256GB RAM our servers run with ~0GB free (because the OS uses up the rest for disk I/O cache.) A choice of solr.MMapDirectoryFactory
'DirectoryFactory' has a significant impact on the amount of memory available - see http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html for detailed information.
Also, as recently commented upon on the Solr mailing list (details at http://lucene.apache.org/solr/resources.html#community), on *nix systems it is best not to employ Transparent Huge Pages (THP). Information about how to disable Huge Pages can be found at http://oracle-base.com/articles/linux/configuring-huge-pages-for-oracle-on-linux-64.php. It should also be noted that some services, including tuned
and ktune
enable Huge Pages so care must be taken to not enable THP after disabling it via the boot configuration.
The installation of Solr is fairly simple:
- Download a suitable distribution from http://lucene.apache.org/solr/mirrors-solr-latest-redir.html and uncompress to your installation of choice. As the British Library uses RHEL we install in /opt/ - /opt in RHEL is untouched by service updates so this directory is safe from accidental misconfiguration.
- To simplify the installation process, rename the /opt/solr-(version) directory to just /opt/solr/
Server changes
- Create solr user if not already existing
- Due to our large data volumes, we up the number of process and file limits (both hard and soft) for the solr user (ph 6144, ps 4096, fh 65536, fs 49152)
- Open appropriate firewall ports
- Create solr pid directory, writable by solr
We use three zookeeper VMs dedicated to each Solr index. Over time we've learnt that our network topology can have some effect to the zookeeper settings, but this is very network specific - it's worth reducing the default settings down to the lowest settings and observing the affect, which might help the recognition of future network caused issues.
As we use RHEL our services follow the typical RHEL /etc/sysconfig/ and /etc/init.d/ deployment approach (though this is entirely up to you/your environment.)
Inside /etc/sysconfig/solr
various environment variables are set:
- node name
- solr collection name
- number of shards
- Jetty settings: home, logs, pid, user, port
- Java START_OPTS
- zookeeper name:port,name:port etc
- zookeeper timeout (The only value that changes per shard is the node name. Plus, it is preferable to not use the same storage device for the Solr data and for the service log, to eliminate contention of I/O.)
/opt/solr/example/solr/collection/conf/schema.xml
- add your particular fields of reference. Strongly suggest all unnecessary fields are removed.
/opt/solr/example/solr/collection/conf/solrconfig.xml
:
- configure data directory
- configure directoryFactory
- configure autocommit and opensearcher
A significant benefit of installing into the directory /opt/solr/ and making configuration amendments is that this installation can then be used as the master from which actual solr nodes (to meet number of shards) can be copied.
Generally the details that need amending per node are:
- node name
- port
- log directory
- zookeeper settings
- For first service:
-DzkRun
- For rest:
-DzkHost:localhost:9983