Elasticsearch Configuration
In this post, we will be talking about how to make Elasticsearch more stable and performant.
Elasticsearch is a distributed RESTful search engine built for the cloud. Fore more information please follow this link.
Before we start, you can see the difference between test results;
Test Results
Data size: 60-80 Kb
Before Tuning | After Tuning | |
---|---|---|
Successful calls | 5000 | 5000 |
Total time | 10.94 s | 4.73 s |
Average | 1.92 s | 0.76 s |
Fastest | 0.17 s | 0.09 s |
Slowest | 4.95 s | 2.74 s |
RPS | 450-500 | 1000-1100 |
Status codes | ||
Code 200 | 4676 | 5000 |
Code 429 | 16 | 0 |
Code 503 | 307 | 0 |
Now we can start with tuning OS level settings which mentioned in ES documentations;
Configuring OS
Brief
First things first, let’s get OS(Ubuntu 14.04) ready. Elasticsearch requires only Java(>1.7). Newer ES versions may require higher version of java.
Virtual memory is typically consumed by processes, file system caches, and the kernel. Virtual memory utilization depends on a number of factors, which can be affected by the following parameters.
ES recommends to set this value 1
, also according to Red Hat, a low swappiness
value is recommended for database workloads. As an example, for Oracle databases, Red Hat recommended swappiness
value is 10
. For further reading Tuning Virtual Memory.
Why do we set this value to 1 instead of 0?
Setting swappiness to 0 more aggressively avoids swapping out, which increases the risk of OOM killing under strong memory and I/O pressure.
Maximum number of connection an application can request.
This property allows for the restriction of the number of VMAs (Virtual Memory Areas) that a particular process can own. When it reaches the limit, out of memory error will be thrown.
Sets the maximum number of file-handles that the Linux kernel will allocate.
Sets the limits of file descriptors for specific user.
I had Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (limit).
error before making this change. But thanks to mrzard I got rid of this problem by setting this unlimited.
Just in case, soft limit can be temporarily exceeded by the user, but the system will not allow a user to exceed hard limit. We just go strict with this so we set both the same value.
The pam_limits PAM module sets limits on the system resources that can be obtained in a user-session.
Tries to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. This attribute provides JVM to lock its memory block and protects it from OS to swap this memory block. This is kind of performance optimization.
Field data cache is unbounded. This, of course, could make your JVM heap explode.To avoid nasty surprises we limit this with 30%.(affects search performance)
Even though filters are relatively small, they can take up large portions of the JVM heap if you have a lot of data and numerous different filters. So we limit this with 30%.
Support for compression when possible (with Accept-Encoding). Defaults to false. We use it with gzip. Made huge impact on performance.
Very important setting, it prevents clusters from complications. Newer versions come with default unicast.
This setting is set according to this calculation (number of master-eligible nodes / 2) + 1.
This setting prevents deleting index with wildcards *. Requires full name.
You can prevent the automatic creation of indices by adding this setting to the config/elasticsearch.yml file on each node.
Set these settings to tolerate error rate and prevent undesired connection losses between nodes.
The default installation of Elasticsearch is configured with a 1 GB heap. According to our long researches this value should be the half size of total RAM. Should not cross 30.5 GB!
Implementation
Open the sysctl.conf;
Add these properties
After that, go to the limits.conf;
The important thing is, which user is defined below. Our ES user should access these informations. It is recommended that using specific user for such big applications.(We did it in Redis too.) This user name is default when you installed the ES.
and to make these properties persistent you have to modify the
Add this property
You may need to reboot the machine to those changes to be applied.
Configuring Elasticsearch
Now everyting is ready for the Elasticsearch to be installed. You can use this bash script.Here you can get the gist.
After that execute;
es.sh file
Elasticsearch has newer versions but I go with 1.7. It is up to you. You can choose whichever you want. I strongly recommend you to install Elasticsearch this way. If you download the tar.gz and go with that way, you have to create your init scripts and also you have to create Elasticsearch user which is very important to make configuration easier. Anyway, I assume you installed it with the script. Now you have elasticsearch.yml and logging.yml files under
In this part, let’s open the elasticsearch.yml. I only show you the places that need to be shown. All other settings are default.
After that, let’s go to elasticsearch start script.
One of the most important thing in ES, heap size. As much as I searched, mostly heap size should be half of total ram size and also should not be more than 30.5GB.
Finally, you can check the properties for our ES user
You can reboot the machines and check your cluster status from sense
or check every node from console
If your nodes don’t start on startup, probably your init scripts did not installed properly. Use this command and reboot.
If you get this exception org.elasticsearch.transport.RemoteTransportException
check your nodes to know which version of java is installed.