We can do two things to improve the simple and sub-optimal disjoint-set subalgorithms:
Path compression heuristic:
findSet does not need to ever handle a tree with height bigger than
2. If it ends up iterating such a tree, it can link the lower nodes directly to the root, optimizing future traversals;
subalgo findSet(v: a node): if v.parent != v v.parent = findSet(v.parent) return v.parent
Height-based merging heuristic: for each node, store the height of its subtree. When merging, make the taller tree the parent of the smaller one, thus not increasing anyone's height.
subalgo unionSet(u, v: nodes): vRoot = findSet(v) uRoot = findSet(u) if vRoot == uRoot: return if vRoot.height < uRoot.height: vRoot.parent = uRoot else if vRoot.height > uRoot.height: uRoot.parent = vRoot else: uRoot.parent = vRoot uRoot.height = uRoot.height + 1
This leads to
O(alpha(n)) time for each operation, where
alpha is the inverse of the fast-growing Ackermann function, thus it is very slow growing, and can be considered
O(1) for practical purposes.
This makes the entire Kruskal's algorithm
O(m log m + m) = O(m log m), because of the initial sorting.
Path compression may reduce the height of the tree, hence comparing heights of the trees during union operation might not be a trivial task. Hence to avoid the complexity of storing and calculating the height of the trees the resulting parent can be picked randomly:
subalgo unionSet(u, v: nodes): vRoot = findSet(v) uRoot = findSet(u) if vRoot == uRoot: return if random() % 2 == 0: vRoot.parent = uRoot else: uRoot.parent = vRoot
In practice this randomised algorithm together with path compression for
findSet operation will result in comparable performance, yet much simpler to implement.