Elasticsearch Configuration

In this post, we will be talking about how to make Elasticsearch more stable and performant.

Elasticsearch is a distributed RESTful search engine built for the cloud. Fore more information please follow this link.

Before we start, you can see the difference between test results;

Test Results

Data size: 60-80 Kb

	Before Tuning	After Tuning
Successful calls	5000	5000
Total time	10.94 s	4.73 s
Average	1.92 s	0.76 s
Fastest	0.17 s	0.09 s
Slowest	4.95 s	2.74 s
RPS	450-500	1000-1100
Status codes
Code 200	4676	5000
Code 429	16	0
Code 503	307	0

Now we can start with tuning OS level settings which mentioned in ES documentations;

Configuring OS

Brief

First things first, let’s get OS(Ubuntu 14.04) ready. Elasticsearch requires only Java(>1.7). Newer ES versions may require higher version of java.

Virtual memory is typically consumed by processes, file system caches, and the kernel. Virtual memory utilization depends on a number of factors, which can be affected by the following parameters.

vm.swappiness

ES recommends to set this value 1, also according to Red Hat, a low swappiness value is recommended for database workloads. As an example, for Oracle databases, Red Hat recommended swappiness value is 10. For further reading Tuning Virtual Memory.

Why do we set this value to 1 instead of 0?

Setting swappiness to 0 more aggressively avoids swapping out, which increases the risk of OOM killing under strong memory and I/O pressure.

net.core.somaxconn

Maximum number of connection an application can request.

vm.max_map_count

This property allows for the restriction of the number of VMAs (Virtual Memory Areas) that a particular process can own. When it reaches the limit, out of memory error will be thrown.

fs.file-max

Sets the maximum number of file-handles that the Linux kernel will allocate.

elasticsearch    soft     nofile             65535
elasticsearch    hard     nofile             65535

Sets the limits of file descriptors for specific user.

elasticsearch    soft     memlock          unlimited
elasticsearch    hard     memlock          unlimited

I had Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (limit). error before making this change. But thanks to mrzard I got rid of this problem by setting this unlimited.

Just in case, soft limit can be temporarily exceeded by the user, but the system will not allow a user to exceed hard limit. We just go strict with this so we set both the same value.

session required pam_limits.so

The pam_limits PAM module sets limits on the system resources that can be obtained in a user-session.

bootstrap.mlockall: true

Tries to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. This attribute provides JVM to lock its memory block and protects it from OS to swap this memory block. This is kind of performance optimization.

indices.fielddata.cache.size: 30%

Field data cache is unbounded. This, of course, could make your JVM heap explode.To avoid nasty surprises we limit this with 30%.(affects search performance)

indices.cache.filter.size: 30%

Even though filters are relatively small, they can take up large portions of the JVM heap if you have a lot of data and numerous different filters. So we limit this with 30%.

http.compression: true

Support for compression when possible (with Accept-Encoding). Defaults to false. We use it with gzip. Made huge impact on performance.

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]

Very important setting, it prevents clusters from complications. Newer versions come with default unicast.

discovery.zen.minimum_master_nodes: 2

This setting is set according to this calculation (number of master-eligible nodes / 2) + 1.

action.destructive_requires_name:true

This setting prevents deleting index with wildcards *. Requires full name.

action.auto_create_index: false

You can prevent the automatic creation of indices by adding this setting to the config/elasticsearch.yml file on each node.

Discovery Zen settings

discovery.zen.ping.timeout: 10s 
discovery.zen.fd.ping_retries: 3 
discovery.zen.fd.ping_interval: 3s 
discovery.zen.fd.ping_timeout: 30s

Set these settings to tolerate error rate and prevent undesired connection losses between nodes.

ES_HEAP_SIZE

The default installation of Elasticsearch is configured with a 1 GB heap. According to our long researches this value should be the half size of total RAM. Should not cross 30.5 GB!

Implementation

Open the sysctl.conf;

nano /etc/sysctl.conf

Add these properties

vm.swappiness=1                          # turn off swapping
net.core.somaxconn=65535                 # up the number of connections per port
vm.max_map_count=262144                  #(default) http://www.redhat.com/magazine/001nov04/features/vm
fs.file-max=518144                       # http://www.tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec72.html

After that, go to the limits.conf;

nano /etc/security/limits.conf

The important thing is, which user is defined below. Our ES user should access these informations. It is recommended that using specific user for such big applications.(We did it in Redis too.) This user name is default when you installed the ES.

elasticsearch    soft    nofile          65535
elasticsearch    hard    nofile          65535
elasticsearch    soft    memlock         unlimited
elasticsearch    hard    memlock         unlimited

and to make these properties persistent you have to modify the

nano /etc/pam.d/common-session-noninteractive
nano /etc/pam.d/common-session

Add this property

session required pam_limits.so

You may need to reboot the machine to those changes to be applied.

Configuring Elasticsearch

Now everyting is ready for the Elasticsearch to be installed. You can use this bash script.Here you can get the gist.

wget "https://gist.githubusercontent.com/ziyasal/67b2c68930a168735052/raw/64ff4d6510f91c70416df1ff238f62cda558f6c7/es.sh"

After that execute;

sh es.sh

es.sh file

#!/bin/bash

ELASTICSEARCH_VERSION=1.7.3

### Download and install the Public Signing Key
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-${ELASTICSEARCH_VERSION}.deb

### Install Elasticsearch
sudo dpkg -i elasticsearch-${ELASTICSEARCH_VERSION}.deb

### Use the following commands to ensure, that elasticsearch starts when the system is booted and then start up elasticsearch
sudo update-rc.d elasticsearch defaults 95 10 
sudo service elasticsearch start

### Lets wait a little while ElasticSearch starts
sleep 20

### Make sure service is running
curl http://localhost:9200

Elasticsearch has newer versions but I go with 1.7. It is up to you. You can choose whichever you want. I strongly recommend you to install Elasticsearch this way. If you download the tar.gz and go with that way, you have to create your init scripts and also you have to create Elasticsearch user which is very important to make configuration easier. Anyway, I assume you installed it with the script. Now you have elasticsearch.yml and logging.yml files under

cd /etc/elasticsearch

In this part, let’s open the elasticsearch.yml. I only show you the places that need to be shown. All other settings are default.

nano /etc/elasticsearch/elasticsearch.yml

bootstrap.mlockall: true

action.auto_create_index: false 
action.destructive_requires_name: true
indices.fielddata.cache.size: 30%
indices.cache.filter.size: 30%

http.compression: true

discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false 
discovery.zen.ping.unicast.hosts: ["ip-of-machine-1", "ip-of-machine-2", "ip-of-machine-3"]
discovery.zen.ping.timeout: 10s 
discovery.zen.fd.ping_retries: 3 
discovery.zen.fd.ping_interval: 3s 
discovery.zen.fd.ping_timeout: 30s

After that, let’s go to elasticsearch start script.

nano /etc/defaults/elasticsearch

One of the most important thing in ES, heap size. As much as I searched, mostly heap size should be half of total ram size and also should not be more than 30.5GB.

# Heap size defaults to 256m min, 1g max
# Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g
ES_HEAP_SIZE=4g

Finally, you can check the properties for our ES user

su elasticsearch --shell /bin/bash --command "cat /proc/sys/vm/swappiness "
su elasticsearch --shell /bin/bash --command "cat /proc/sys/net/core/somaxconn"
su elasticsearch --shell /bin/bash --command "cat /proc/sys/vm/max_map_count "
su elasticsearch --shell /bin/bash --command "cat /proc/sys/fs/file-max "

su elasticsearch --shell /bin/bash --command "ulimit -n"
su elasticsearch --shell /bin/bash --command "ulimit -Sn"
su elasticsearch --shell /bin/bash --command "ulimit -Hn"

You can reboot the machines and check your cluster status from sense

GET /_nodes/process?pretty

or check every node from console

curl 'http://localhost:9200/?pretty'

If your nodes don’t start on startup, probably your init scripts did not installed properly. Use this command and reboot.

sudo update-rc.d elasticsearch defaults 95 10

If you get this exception org.elasticsearch.transport.RemoteTransportException check your nodes to know which version of java is installed.

Elasticsearch Configuration and Performance Tuning