Elasticsearch Elasticsearch Configuration

30% OFF - 9th Anniversary discount on Entity Framework Extensions until December 15 with code: ZZZANNIVERSARY9


Elasticsearch comes with a set of defaults that provide a good out of the box experience for development. The implicit statement there is that it is not necessarily great for production, which must be tailored for your own needs and therefore cannot be predicted.

The default settings make it easy to download and run multiple nodes on the same machine without any configuration changes.

Where are the settings?

Inside each installation of Elasticsearch is a config/elasticsearch.yml. That is where the following settings live:

  • cluster.name
    • The name of the cluster that the node is joining. All nodes in the same cluster must share the same name.
    • Currently defaults to elasticsearch.
  • node.*
    • node.name
      • If not supplied, a random name will be generated each time the node starts. This can be fun, but it is not good for production environments.
      • Names do not have to be unique, but they should be unique.
    • node.master
      • A boolean setting. When true, it means that the node is an eligible master node and it can be the elected master node.
      • Defaults to true, meaning every node is an eligible master node.
    • node.data
      • A boolean setting. When true, it means that the node stores data and handles search activity.
      • Defaults to true.
  • path.*
    • path.data
      • The location that files are written for the node. All nodes use this directory to store metadata, but data nodes will also use it to store/index documents.
      • Defaults to ./data.
        • This means that data will be created for you as a peer directory to config inside of the Elasticsearch directory.
    • path.logs
      • The location that log files are written.
      • Defaults to ./logs.
  • network.*
    • network.host

      • Defaults to _local_, which is effectively localhost.
        • This means that, by default, nodes cannot be communicated with from outside of the current machine!
    • network.bind_host

      • Potentially an array, this tells Elasticsearch what addresses of the current machine to bind sockets too.
        • It is this list that enables machines from outside of the machine (e.g., other nodes in the cluster) to talk to this node.
      • Defaults to network.host.
    • network.publish_host

      • A singular host that is used to advertise to other nodes how to best communicate with this node.
        • When supplying an array to network.bind_host, this should be the one host that is intended to be used for inter-node communication.
      • Defaults to network.host`.
  • discovery.zen.*
    • discovery.zen.minimum_master_nodes
      • Defines quorum for master election. This must be set using this equation: (M / 2) + 1 where M is the number of eligible master nodes (nodes using node.master: true implicitly or explicitly).
      • Defaults to 1, which only is valid for a single node cluster!
    • discovery.zen.ping.unicast.hosts
      • The mechanism for joining this node to the rest of a cluster.
      • This should list eligible master nodes so that a node can find the rest of the cluster.
      • The value that should be used here is the network.publish_host of those other nodes.
      • Defaults to localhost, which means it only looks on the local machine for a cluster to join.

What type of settings exist?

Elasticsearch provides three different types of settings:

  • Cluster-wide settings
    • These are settings that apply to everything in the cluster, such as all nodes or all indices.
  • Node settings
    • These are settings that apply to just the current node.
  • Index settings
    • These are settings that apply to just the index.

Depending on the setting, it can be:

  • Changed dynamically at runtime
  • Changed following a restart (close / open) of the index
    • Some index-level settings do not require the index to be closed and reopened, but might require the index to be forceably re-merged for the setting to apply.
      • The compression level of an index is an example of this type of setting. It can be changed dynamically, but only new segments take advantage of the change. So if an index will not change, then it never takes advantage of the change unless you force the index to recreate its segments.
  • Changed following a restart of the node
  • Changed following a restart of the cluster
  • Never changed

Always check the documentation for your version of Elasticsearch for what you can or cannot do with a setting.

How can I apply settings?

You can set settings a few ways, some of which are not suggested:

  • Command Line Arguments

In Elasticsearch 1.x and 2.x, you can submit most settings as Java System Properties prefixed with es.:

$ bin/elasticsearch -Des.cluster.name=my_cluster -Des.node.name=`hostname`

In Elasticsearch 5.x, this changes to avoid using Java System Properties, instead using a custom argument type with -E taking the place of -Des.:

$ bin/elasticsearch -Ecluster.name=my_cluster -Enode.name=`hostname`

This approach to applying settings works great when using tools like Puppet, Chef, or Ansible to start and stop the cluster. However it works very poorly when doing it manually.

  • YAML settings
    • Shown in examples
  • Dynamic settings
    • Shown in examples

The order that settings are applied are in the order of most dynamic:

  1. Transient settings
  2. Persistent settings
  3. Command line settings
  4. YAML (static) settings

If the setting is set twice, once at any of those levels, then the highest level takes effect.

Got any Elasticsearch Question?