While in combination with path compression, union by rank nearly achieves constant time operations on union-find data structures, there is a final trick that allows us to get rid of the rank
storage altogether by storing the rank in out-of-bounds entries of the parent
array. It is based on the following observations:
parent
.parent[i]
is at most size - 1
, i.e. larger values are unused.This brings us to the following approach:
parent[i] == i
, we now identify representatives byparent[i] >= size
i
has rank parent[i] - size
parent[i] = size
instead of parent[i] = i
, i.e. each set is its own representative with rank 0.Since we only offset the rank values by size
, we can simply replace the rank
vector by the parent
vector in the implementation of merge
and only need to exchange the condition identifying representatives in find
:
Finished implementation using union by rank and path compression:
using std::size_t;
class union_find {
private:
std::vector<size_t> parent;
public:
union_find(size_t n) : parent(n, n) {} // initialize with parent[i] = n
size_t find(size_t i) const {
if (parent[i] >= parent.size()) // If we already have a representative
return i; // return it
return find(parent[i]); // otherwise return the parent's repr.
}
void merge(size_t i, size_t j) {
size_t pi = find(i);
size_t pj = find(j);
if (pi == pj) {
return;
}
if (parent[pi] < parent[pj]) {
// link the smaller group to the larger one
parent[pi] = pj;
} else if (parent[pi] > parent[pj]) {
// link the smaller group to the larger one
parent[pj] = pi;
} else {
// equal rank: link arbitrarily and increase rank
parent[pj] = pi;
++parent[pi];
}
}
};