data-structures Union-find data structure Final Improvement: Union by rank with out-of-bounds storage

Help us to keep this website almost Ad Free! It takes only 10 seconds of your time:
> Step 1: Go view our video on YouTube: EF Core Bulk Extensions
> Step 2: And Like the video. BONUS: You can also share it!

Example

While in combination with path compression, union by rank nearly achieves constant time operations on union-find data structures, there is a final trick that allows us to get rid of the rank storage altogether by storing the rank in out-of-bounds entries of the parentarray. It is based on the following observations:

  • We actually only need to store the rank for representatives, not for other elements. For these representatives, we don't need to store a parent.
  • So far, parent[i] is at most size - 1, i.e. larger values are unused.
  • All ranks are at most log n.

This brings us to the following approach:

  • Instead of the condition parent[i] == i, we now identify representatives by
    parent[i] >= size
  • We use these out-of-bounds values to store the ranks of the set, i.e. the set with representative ihas rank parent[i] - size
  • Thus we initialize the parent array with parent[i] = size instead of parent[i] = i, i.e. each set is its own representative with rank 0.

Since we only offset the rank values by size, we can simply replace the rank vector by the parent vector in the implementation of merge and only need to exchange the condition identifying representatives in find:

Finished implementation using union by rank and path compression:

using std::size_t;

class union_find {
private:
    std::vector<size_t> parent;

public:
    union_find(size_t n) : parent(n, n) {} // initialize with parent[i] = n

    size_t find(size_t i) const {
        if (parent[i] >= parent.size()) // If we already have a representative
            return i;                   // return it
        return find(parent[i]);         // otherwise return the parent's repr.
    }

    void merge(size_t i, size_t j) {
        size_t pi = find(i);
        size_t pj = find(j);
        if (pi == pj) {
            return;
        }
        if (parent[pi] < parent[pj]) {
            // link the smaller group to the larger one
            parent[pi] = pj;
        } else if (parent[pi] > parent[pj]) {
            // link the smaller group to the larger one
            parent[pj] = pi;
        } else {
            // equal rank: link arbitrarily and increase rank
            parent[pj] = pi;
            ++parent[pi];
        }
    }
};


Got any data-structures Question?