Another heuristic commonly used instead of union by size is the union by rank heuristic
Its basic idea is that we don't actually need to store the exact size of the sets, an approximation of the size (in this case: roughly the logarithm of the set's size) suffices to achieve the same speed as union by size.
For this, we introduce the notion of the rank of a set, which is given as follows:
One advantage of union by rank is the space usage: As the maximum rank increases roughly like , for all realistic input sizes, the rank can be stored in a single byte (since n < 2^255
).
A simple implementation of union by rank might look like this:
private:
...
std::vector<unsigned char> rank;
public:
union_find(size_t n) : parent(n), rank(n, 0) { ... }
...
void merge(size_t i, size_t j) {
size_t pi = find(i);
size_t pj = find(j);
if (pi == pj) {
return;
}
if (rank[pi] < rank[pj]) {
// link the smaller group to the larger one
parent[pi] = pj;
} else if (rank[pi] > rank[pj]) {
// link the smaller group to the larger one
parent[pj] = pi;
} else {
// equal rank: link arbitrarily and increase rank
parent[pj] = pi;
++rank[pi];
}
}