Have you ever wondered how the search engines work? How does Google line-up millions of results in front of you in just a few milliseconds? How does a huge database situated thousands of miles away from you find out information you're searching for and send them back to you? The reason behind this is not possible only by using faster internet and super-computers. Some mesmerizing searching algorithms and data-structures work behind it. One of them is Trie.
Trie, also called digital tree and sometimes radix tree or prefix tree (as they can be searched by prefixes), is a kind of search treeāan ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. It is one of those data-structures that can be easily implemented. Let's say you have a huge database of millions of words. You can use trie to store these information and the complexity of searching these words depends only on the length of the word that we are searching for. That means it doesn't depend on how big our database is. Isn't that amazing?
Let's assume we have a dictionary with these words:
algo
algea
also
tom
to
We want to store this dictionary in memory in such a way that we can easily find out the word that we're looking for. One of the methods would involve sorting the words lexicographically-how the real-life dictionaries store words. Then we can find the word by doing a binary search. Another way is using Prefix Tree or Trie, in short. The word 'Trie' comes from the word Retrieval. Here, prefix denotes the prefix of string which can be defined like this: All the substrings that can be created starting from the beginning of a string are called prefix. For example: 'c', 'ca', 'cat' all are the prefix of the string 'cat'.
Now let's get back to Trie. As the name suggests, we'll create a tree. At first, we have an empty tree with just the root:
We'll insert the word 'algo'. We'll create a new node and name the edge between these two nodes 'a'. From the new node we'll create another edge named 'l' and connect it with another node. In this way, we'll create two new edges for 'g' and 'o'. Notice that, we're not storing any information in the nodes. For now, we'll only consider creating new edges from these nodes. From now on, let's call an edge named 'x' - edge-x
Now we want to add the word 'algea'. We need an edge-a from root, which we already have. So we don't need to add new edge. Similarly, we have an edge from 'a' to 'l' and 'l' to 'g'. That means 'alg' is already in the trie. We'll only add edge-e and edge-a with it.
We'll add the word 'also'. We have the prefix 'al' from root. We'll only add 'so' with it.
Let's add 'tom'. This time, we create a new edge from root as we don't have any prefix of tom created before.
Now how should we add 'to'? 'to' is completely a prefix of 'tom', so we don't need to add any edge. What we can do is, we can put end-marks in some nodes. We'll put end marks in those nodes where at least one word is completed. Green circle denotes the end-mark. The trie will look like:
You can easily understand why we added the end-marks. We can determine the words stored in trie. The characters will be on the edges and nodes will contain the end-marks.
Now you might ask, what's the purpose of storing words like this? Let's say, you are asked to find the word 'alice' in the dictionary. You'll traverse the trie. You'll check if there is an edge-a from root. Then check from 'a', if there's an edge-l. After that, you won't find any edge-i, so you can come to the conclusion that, the word alice doesn't exist in the dictionary.
If you're asked to find the word 'alg' in the dictionary, you'll traverse root->a, a->l, l->g, but you won't find a green node at the end. So the word doesn't exist in the dictionary. If you search for 'tom', you'll end up in a green node, which means the word exists in the dictionary.
Complexity:
The maximum amount of steps needed to search for a word in a trie is the length of the word that we're looking for. The complexity is O(length). The complexity for insertion is same. The amount of memory needed to implement trie depends on the implementation. We'll see an implementation in another example where we can store 106 characters (not words, letters) in a trie.
Use of Trie: