Elasticsearch Configuration
In this post, we will be talking about how to make Elasticsearch more stable and performant.
Elasticsearch is a distributed RESTful search engine built for the cloud. Fore more information please follow this link.
Before we start, you can see the difference between test results;
Test Results
Data size: 60-80 Kb
Before Tuning | After Tuning | |
---|---|---|
Successful calls | 5000 | 5000 |
Total time | 10.94 s | 4.73 s |
Average | 1.92 s | 0.76 s |
Fastest | 0.17 s | 0.09 s |
Slowest | 4.95 s | 2.74 s |
RPS | 450-500 | 1000-1100 |
Status codes | ||
Code 200 | 4676 | 5000 |
Code 429 | 16 | 0 |
Code 503 | 307 | 0 |
Now we can start with tuning OS level settings which mentioned in ES documentations;
Configuring OS
Brief
First things first, let’s get OS(Ubuntu 14.04) ready. Elasticsearch requires only Java(>1.7). Newer ES versions may require higher version of java.
Virtual memory is typically consumed by processes, file system caches, and the kernel. Virtual memory utilization depends on a number of factors, which can be affected by the following parameters.
vm.swappiness
ES recommends to set this value 1
, also according to Red Hat, a low swappiness
value is recommended for database workloads. As an example, for Oracle databases, Red Hat recommended swappiness
value is 10
. For further reading Tuning Virtual Memory.
Why do we set this value to 1 instead of 0?
Setting swappiness to 0 more aggressively avoids swapping out, which increases the risk of OOM killing under strong memory and I/O pressure.
net.core.somaxconn
Maximum number of connection an application can request.
vm.max_map_count
This property allows for the restriction of the number of VMAs (Virtual Memory Areas) that a particular process can own. When it reaches the limit, out of memory error will be thrown.
fs.file-max
Sets the maximum number of file-handles that the Linux kernel will allocate.
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
Sets the limits of file descriptors for specific user.
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
I had Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (limit).
error before making this change. But thanks to mrzard I got rid of this problem by setting this unlimited.
Just in case, soft limit can be temporarily exceeded by the user, but the system will not allow a user to exceed hard limit. We just go strict with this so we set both the same value.
session required pam_limits.so
The pam_limits PAM module sets limits on the system resources that can be obtained in a user-session.
bootstrap.mlockall: true
Tries to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. This attribute provides JVM to lock its memory block and protects it from OS to swap this memory block. This is kind of performance optimization.
indices.fielddata.cache.size: 30%
Field data cache is unbounded. This, of course, could make your JVM heap explode.To avoid nasty surprises we limit this with 30%.(affects search performance)
indices.cache.filter.size: 30%
Even though filters are relatively small, they can take up large portions of the JVM heap if you have a lot of data and numerous different filters. So we limit this with 30%.
http.compression: true
Support for compression when possible (with Accept-Encoding). Defaults to false. We use it with gzip. Made huge impact on performance.
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2:port"]
Very important setting, it prevents clusters from complications. Newer versions come with default unicast.
discovery.zen.minimum_master_nodes: 2
This setting is set according to this calculation (number of master-eligible nodes / 2) + 1.
action.destructive_requires_name:true
This setting prevents deleting index with wildcards *. Requires full name.
action.auto_create_index: false
You can prevent the automatic creation of indices by adding this setting to the config/elasticsearch.yml file on each node.
Discovery Zen settings
discovery.zen.ping.timeout: 10s
discovery.zen.fd.ping_retries: 3
discovery.zen.fd.ping_interval: 3s
discovery.zen.fd.ping_timeout: 30s
Set these settings to tolerate error rate and prevent undesired connection losses between nodes.
ES_HEAP_SIZE
The default installation of Elasticsearch is configured with a 1 GB heap. According to our long researches this value should be the half size of total RAM. Should not cross 30.5 GB!
Implementation
Open the sysctl.conf;
nano /etc/sysctl.conf
Add these properties
vm.swappiness=1 # turn off swapping
net.core.somaxconn=65535 # up the number of connections per port
vm.max_map_count=262144 #(default) http://www.redhat.com/magazine/001nov04/features/vm
fs.file-max=518144 # http://www.tldp.org/LDP/solrhe/Securing-Optimizing-Linux-RH-Edition-v1.3/chap6sec72.html
After that, go to the limits.conf;
nano /etc/security/limits.conf
The important thing is, which user is defined below. Our ES user should access these informations. It is recommended that using specific user for such big applications.(We did it in Redis too.) This user name is default when you installed the ES.
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited
and to make these properties persistent you have to modify the
nano /etc/pam.d/common-session-noninteractive
nano /etc/pam.d/common-session
Add this property
session required pam_limits.so
You may need to reboot the machine to those changes to be applied.
Configuring Elasticsearch
Now everyting is ready for the Elasticsearch to be installed. You can use this bash script.Here you can get the gist.
wget "https://gist.githubusercontent.com/ziyasal/67b2c68930a168735052/raw/64ff4d6510f91c70416df1ff238f62cda558f6c7/es.sh"
After that execute;
sh es.sh
es.sh file
#!/bin/bash
ELASTICSEARCH_VERSION=1.7.3
### Download and install the Public Signing Key
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-${ELASTICSEARCH_VERSION}.deb
### Install Elasticsearch
sudo dpkg -i elasticsearch-${ELASTICSEARCH_VERSION}.deb
### Use the following commands to ensure, that elasticsearch starts when the system is booted and then start up elasticsearch
sudo update-rc.d elasticsearch defaults 95 10
sudo service elasticsearch start
### Lets wait a little while ElasticSearch starts
sleep 20
### Make sure service is running
curl http://localhost:9200
Elasticsearch has newer versions but I go with 1.7. It is up to you. You can choose whichever you want. I strongly recommend you to install Elasticsearch this way. If you download the tar.gz and go with that way, you have to create your init scripts and also you have to create Elasticsearch user which is very important to make configuration easier. Anyway, I assume you installed it with the script. Now you have elasticsearch.yml and logging.yml files under
cd /etc/elasticsearch
In this part, let’s open the elasticsearch.yml. I only show you the places that need to be shown. All other settings are default.
nano /etc/elasticsearch/elasticsearch.yml
bootstrap.mlockall: true
action.auto_create_index: false
action.destructive_requires_name: true
indices.fielddata.cache.size: 30%
indices.cache.filter.size: 30%
http.compression: true
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["ip-of-machine-1", "ip-of-machine-2", "ip-of-machine-3"]
discovery.zen.ping.timeout: 10s
discovery.zen.fd.ping_retries: 3
discovery.zen.fd.ping_interval: 3s
discovery.zen.fd.ping_timeout: 30s
After that, let’s go to elasticsearch start script.
nano /etc/defaults/elasticsearch
One of the most important thing in ES, heap size. As much as I searched, mostly heap size should be half of total ram size and also should not be more than 30.5GB.
# Heap size defaults to 256m min, 1g max
# Set ES_HEAP_SIZE to 50% of available RAM, but no more than 31g
ES_HEAP_SIZE=4g
Finally, you can check the properties for our ES user
su elasticsearch --shell /bin/bash --command "cat /proc/sys/vm/swappiness "
su elasticsearch --shell /bin/bash --command "cat /proc/sys/net/core/somaxconn"
su elasticsearch --shell /bin/bash --command "cat /proc/sys/vm/max_map_count "
su elasticsearch --shell /bin/bash --command "cat /proc/sys/fs/file-max "
su elasticsearch --shell /bin/bash --command "ulimit -n"
su elasticsearch --shell /bin/bash --command "ulimit -Sn"
su elasticsearch --shell /bin/bash --command "ulimit -Hn"
You can reboot the machines and check your cluster status from sense
GET /_nodes/process?pretty
or check every node from console
curl 'http://localhost:9200/?pretty'
If your nodes don’t start on startup, probably your init scripts did not installed properly. Use this command and reboot.
sudo update-rc.d elasticsearch defaults 95 10
If you get this exception org.elasticsearch.transport.RemoteTransportException
check your nodes to know which version of java is installed.