Algorithms tick: dis-set

Algorithms tick: dis-set

Disjoint Sets with traversal

Implement the lazy forest DisjointSet, so as to efficiently support “Given an item, retrieve all the other items in the same subset”.

Please submit a source file disjointset.py to Moodle. It should implement the following class:

class DisjointSet:

    def add(self, k):
        # Adds a new set consisting of a single item k

    def __getitem__(self, k):
        # Returns a handle to the set containing item k

    def merge(self, h, i):
        # Merge set-with-handle h and set-with-handle i

    def iter_from(self, k):
        # Returns all items in the subset containing k

The tester only tests if your code is correct—it’s up to you to make sure you use an efficient implementation. But the tester runs tests on up to \(n=100\,000\) items, so if your code is \(O(n^2)\) then it will probably time out!

The above interface is a more Pythonic version of the DisjointSet interface given in lecture notes. The __getitem__ name is special in Python; it’s invoked when we write h[k] where h is an instance of DisjointSet. Methods like this are called dunder methods, short for double underline. The iter_from method can return a list, or a set, or a lazy list.

Lazy forest implementation

To get started, you may like to use this implementation of the lazy forest DisjointSet.

class DisjointSet:

    class DSNode:
        def __init__(self, k):
            self.k = k
            self.parent = self
            self.rank = 1

    def __init__(self, ks=[]):
        self._nodes = {k:DisjointSet.DSNode(k) for k in ks}

    def add(self, k):
        self._nodes[k] = DisjointSet.DSNode(k)

    def __getitem__(self, k):
        n = self._nodes[k]
        root = n
        while root != root.parent: 
            root = root.parent
        # path compression heuristic
        while n != root:
            n.parent, n = root, n.parent
        return root

    def merge(self, h1, h2):
        if h1 == h2:
            return
        # weighted union heuristic
        h1,h2 = (h1,h2) if h1.rank >= h2.rank else (h2,h1)
        h2.parent = h1
        h1.rank = h1.rank if h1.rank > h2.rank else h1.rank + 1