Union find data structures provide the following operations:
make_sets(n)
initializes a union-find data structure with n
singletonsfind(i)
returns a representative for the set of element i
union(i,j)
merges the sets containing i
and j
We represent our partition of the elements 0
to n - 1
by storing a parent element parent[i]
for every element i
which eventually leads to a representative of the set containing i
.
If an element itself is a representative, it is its own parent, i.e. parent[i] == i
.
Thus, if we start with singleton sets, every element is its own representative:
We can find the representative for a given set by simply following these parent pointers.
Let us now see how we can merge sets:
If we want to merge the elements 0 and 1 and the elements 2 and 3, we can do this by setting the parent of 0 to 1 and setting the parent of 2 to 3:
In this simple case, only the elements parent pointer itself must be changed.
If we however want to merge larger sets, we must always change the parent pointer of the representative of the set that is to be merged into another set: After merge(0,3), we have set the parent of the representative of the set containing 0 to the representative of the set containing 3
To make the example a bit more interesting, let's now also merge (4,5), (5,6) and (3,4):
The last notion I want to introduce is path compression:
If we want to find the representative of a set, we might have to follow several parent pointers before reaching the representative. We might make this easier by storing the representative for each set directly in their parent node. We lose the order in which we merged the elements, but can potentially have a large runtime gain. In our case, the only paths that aren't compressed are the paths from 0, 4 and 5 to 3: