In our current implementation of merge
, we always choose the left set to be the child of the right set, without taking the size of the sets into consideration. Without this restriction, the paths (without path compression) from an element to its representative might be quite long, thus leading to large runtimes on find
calls.
Another common improvement thus is the union by size heuristic, which does exactly what it says: When merging two sets, we always set the larger set to be the parent of the smaller set, thus leading to a path length of at most steps:
We store an additional member std::vector<size_t> size
in our class which gets initialized to 1 for every element. When merging two sets, the larger set becomes the parent of the smaller set and we sum up the two sizes:
private:
...
std::vector<size_t> size;
public:
union_find(size_t n) : parent(n), size(n, 1) { ... }
...
void merge(size_t i, size_t j) {
size_t pi = find(i);
size_t pj = find(j);
if (pi == pj) { // If the elements are in the same set:
return; // do nothing
}
if (size[pi] > size[pj]) { // Swap representatives such that pj
std::swap(pi, pj); // represents the larger set
}
parent[pi] = pj; // attach the smaller set to the larger one
size[pj] += size[pi]; // update the size of the larger set
}