It's easy to see type
s like a table in an SQL database, where the index
is the SQL database. However, that is not a good way to approach type
s.
In fact, types are literally just a metadata field added to each document by Elasticsearch: _type
. The examples above created two types: my_type
and my_other_type
. That means that each document associated with the types has an extra field automatically defined like "_type": "my_type"
; this is indexed with the document, thus making it a searchable or filterable field, but it does not impact the raw document itself, so your application does not need to worry about it.
All types live in the same index, and therefore in the same collective shards of the index. Even at the disk level, they live in the same files. The only separation that creating a second type provides is a logical one. Every type, whether it's unique or not, needs to exist in the mappings and all of those mappings must exist in your cluster state. This eats up memory and, if each type is being updated dynamically, it eats up performance as the mappings change.
As such, it is a best practice to define only a single type unless you actually need other types. It is common to see scenarios where multiple types are desirable. For example, imagine you had a car index. It may be useful to you to break it down with multiple types:
This way you can search for all cars or limit it by manufacturer on demand. The difference between those two searches are as simple as:
GET /cars/_search
and
GET /cars/bmw/_search
What is not obvious to new users of Elasticsearch is that the second form is a specialization of the first form. It literally gets rewritten to:
GET /cars/_search
{
"query": {
"bool": {
"filter": [
{
"term" : {
"_type": "bmw"
}
}
]
}
}
}
It simply filters out any document that was not indexed with a _type
field whose value was bmw
. Since every document is indexed with its type as the _type
field, this serves as a pretty simple filter. If an actual search had been provided in either example, then the filter would be added to the full search as appropriate.
As such, if the types are identical, it's much better to supply a single type (e.g., manufacturer
in this example) and effectively ignore it. Then, within each document, explicitly supply a field called make
or whatever name you prefer and manually filter on it whenever you want to limit to it. This will reduce the size of your mappings to 1/n
where n
is the number of separate types. It does add another field to each document, at the benefit of an otherwise simplified mapping.
In Elasticsearch 1.x and 2.x, such a field should be defined as
PUT /cars
{
"manufacturer": { <1>
"properties": {
"make": { <2>
"type": "string",
"index": "not_analyzed"
}
}
}
}
In Elasticsearch 5.x, the above will still work (it's deprecated), but the better way is to use:
PUT /cars
{
"manufacturer": { <1>
"properties": {
"make": { <2>
"type": "keyword"
}
}
}
}
Types should be used sparingly within your indices because it bloats the index mappings, usually without much benefit. You must have at least one, but there is nothing that says you must have more than one.
At the index level, there is no difference between one type being used with a few fields that are sparsely used and between multiple types that share a bunch of non-sparse fields with a few not shared (meaning the other type never even uses the field(s)).
Said differently: a sparsely used field is sparse across the index regardless of types. The sparsity does not benefit -- or really hurt -- the index just because it is defined in a separate type.
You should just combine these types and add a separate type field.
Because each field is really only defined once at the Lucene level, regardless of how many types there are. The fact that types exist at all is a feature of Elasticsearch and it is only a logical separation.
No. If you manage to find a way to do so in ES 2.x or later, then you should open up a bug report. As noted in the previous question, Lucene sees them all as a single field, so there is no way to make this work appropriately.
ES 1.x left this as an implicit requirement, which allowed users to create conditions where one shard's mappings in an index actually differed from another shard in the same index. This was effectively a race condition and it could lead to unexpected issues.
n
to 1).