Minimum Spanning Tree Algorithms

Input A directed graph with non-negative weights

Output A minimum spanning tree - A tree which contains all nodes of the input graph and is of minimum cost

There are several greedy rules to consider, all of which correctly find minimum spanning trees.

  • Start with the vertices from the input and add edges in order of increasing cost, not including those which create a cycle. This is Kruskal's algorithm.
  • Start with a root node and add the cheapest node to the existing tree each iteration. This is Prim's algorithm.
  • Start with the full graph and remove edges in order of decreasing cost if removing the edge will not make the graph disconnected. This is called the Reverse-delete algorithm.

Proof of the cut set property

The cut set property can be used to prove the correctness of Kruskal's and Prim's algorithms.

Assume all edges have distinct costs. Consider two non-empty, sets $S$ and $V \setminus S$. The edges between these sets is called the cut set. Let $e = (v,w)$ be the minimum cost edge in the cut set. The cut set property states that every minimum spanning tree contains edge $e$.

This can be proven by an exchange argument. Let $T$ be a spanning tree that does not contain $e = (v,w)$. $T$ is a spanning tree, so there must still be a path $P$ from $v$ to $w$. Consider a $v' \in S$ and $w' \in V \setminus S$. Let $e' = (v', w')$ be the edge joining them. $e$ is therefore in the cut set and $P$ is a path from $v$ to $w$ containing $e$.

If $e'$ is exchanged for $e$ this gives a new tree $T' = (T \setminus {e'}) \cup {e}$. $T'$ is connected and acyclic because $T$ was connected and acyclic and $e$ connects the two disjoint trees created by removing $e'$.

As both $e$ and $e'$ are in the cut set, but $e$ is the cheapest edge in the cut set, $c_e < c_{e'}$. This means the total cost of $T'$ is less than that of $T$ because $T'$ contains $e$. $\blacksquare$

Proof of optimality for Kruskal's

Consider any edge $e = (v,w)$ added by Kruskal's algorithm and let $S$ be the set of all nodes to which $v$ has a path just before $e$ is added. At this point, $v \in S$ and $w \not\in S$. This means no edge between $S$ and $S \setminus V$ has been encountered yet. Therefore $e$ is the cheapest edge in the cut set and by the cut set property it belongs to every minimum spanning tree. $\blacksquare$

Proof of optimality for Prim's

In each iteration there is a set $S \subseteq V$ on which a partial spanning tree has been constructed and a node $v$ and edge $e$ of minimum cost. $e$ is therefore the cheapest edge in the cut set, so by the cut set property belongs to every minimum spanning tree. $\blacksquare$

Proof of the cycle property

The cycle property can be used to prove the correctness of the reverse-delete algorithm.

Assume all edge costs are distinct. Let $C$ be any cycle in $G$ and let $e = (v,w)$ be the most expensive edge belonging to $C$. $e$ does not belong to any minimum spanning tree of $G$.

This can be proven by an exchange argument. Let $T$ be a spanning tree that contains $e$. Deleting $e$ from $T$ splits the nodes into to disjoint sets, $S$ containing $v$ and $V \setminus S$ containing $w$. As $e$ was in a cycle, a new path $P$ can be found from $v$ to $w$ by following the cycle. Let $e'$ be the edge in the cut set on this alternative path.

Consider $T' = (T - {e}) \cup {e'}$. Using similar reasoning to the cut set property, $T'$ is a spanning tree of $G$. Since $e$ was the most expensive edge in the cycle $C$ and $e'$ also belongs to $C$, $e'$ must be cheaper than $e$ so $T'$ is cheaper than $T$. $\blacksquare$

Proof of optimality for reverse-delete

Consider any edge $e = (v,w)$ removed by the reverse-delete algorithm. When $e$ is removed it is in a cycle $C$ and since it is the first edge considered in order of decreasing cost it must be the most expensive edge on the cycle. Therefore by the cycle property, it does not belong to any MST. The result is connected, as reverse-delete won't remove an edge which will make the graph disconnected. The result is acyclic as if the graph did contain a cycle, the algorithm would've removed the most expensive edge on it so the result must not contain any cycles. Therefore the result is a MST. $\blacksquare$

Eliminating the assumption of distinct edge costs

By adding extremely small, distinct amounts to the cost of every edge the edge costs are all made distinct. The original ordering of distinct nodes will not change as the amounts added are so small but non-distinct nodes are now distinct and will therefore work with the assumptions made.

Prim's implementation

Prim's algorithm can be efficiently implemented using a priority queue of nodes with the attachment cost as keys, which takes $O(m)$ time to populate. ExtractMin is used to get the cheapest node and ChangeKey is used to update attachment costs. There are $n-1$ ExtractMin and $O(m)$ ChangeKey operations. These can be performed in $O(\log n)$ time for a heap based implementation, giving an overall time complexity of $O(m \log n)$.

Kruskal's implementation

Kruskal's algorithm can be implemented efficiently using a union-find data structure. This data structure maintains disjoint sets and provides two operations, union and find. Find(x) returns the name of the set containing x. If two nodes x and y are in the same set then Find(x) = Find(y). Union(x,y) merges two sets into a single set.

A pointer-based implementation of the union-find data structure can be initialised in $O(n)$, perform Union in $O(1)$ and perform Find in $O(\log n)$ time.

Kruskal's algorithm can be implemented using the union-find data structure as follows:

sort edges by ascending cost
for each vertex v:
    make-set(v)
for each edge e = (u, v):
    if find(u) != find(v):
        union(find(u), find(v))

For a graph with $n$ nodes and $m$ edges, sorting takes $O(m\log m)$, which is $O(n\log m)$ given that $m \leq n^2$. The union-find initialisation takes $O(n)$ and there are $m$ iterations with two finds and a union. Using the pointer based implementation, these take $O(\log n)$ and $O(1)$ time respectively, giving $O(m\log n)$ and $O(m)$ for $m$ iterations.

This gives an overall time complexity of $O(m\log n)$. Even if a more efficient union-find implementation is used, the complexity is still limited by the sorting.