Algorithms tick 1*

Algorithms tick 1*: binary heap continued

In this tick you will re-implement the binary heap, with the goal of reducing the number of comparisons.

The classic max extraction, which you implemented in Tick 1, swaps the largest element with the root, and then sifts the root down to its correct place by swapping repeatedly with the maximum at each level. This involves 2 comparisons at each level, since there are three values and we wish to find the maximum. This costs \(2\log_2 n\) comparisons in the worst case (there are \(\log_2 n\) levels, and the worst case is that we descend all the way down to a leaf). We might do better, but only if we don’t have to go down many levels to find where the new root should move to. However, since that new root was a leaf in a max-heap initially, it must have a small value and is very likely to end up at or near a leaf again. So the \(2\log_2 n\) worst case is actually very common.

One way to address this is to use a bottom-up extraction. This works as follows — see also Wikipedia:

  1. Swap the end element and the root as before. Save the new root value in a temporary variable.

  2. Walk down the tree, always following the biggest child. We end up at a leaf with, say, index \(L\).

  3. Now walk up the tree, starting from \(L\). Stop when we find a value that is bigger than the new root held in the temporary variable: this is where the root element needs to end up. Call the index for this position \(p\).

  4. Continue walking up, starting at \(p\). However, now we push up each element to be its parent. This leaves a void at position \(p\), which we fill with the root value. No comparisons are made for this stage.

Step 2 involves one comparison at each level to make \(\log_2 n\) in total. Step 3 takes up to \(\log_2 n\) comparisons, depending how high up we have to go — if indeed the end position is always near a leaf, this won’t be very many. So we are likely to need significantly fewer than \(2\log_2 n\) comparisons (although the worst case is still \(2\log_2 n\)).

Instrumenting your code. Your first task is to instrument your code so that we can count how many comparisons it’s actually making. The Python tester and the Java tester use different strategies to achieve this, in keeping with the idioms of each language, and details are given in the submission instructions.

Implementing the bottom-up heap. Your second task is to re-implement the extraction method as described above. Here’s an example, which may help you check your code. (The number of comparisons for constructing the heap is the same in both cases, since we’re only altering the extraction code, not the construction.)

MaxHeap
construct a new heap with cdargswhk: 12 comparisons
pop w: 4 comparisons 
pop s: 4 comparisons
pop r: 4 comparisons
pop k: 4 comparisons
pop h: 3 comparisons
pop g: 2 comparisons
pop d: 1 comparisons
pop c: 0 comparisons
pop a: 0 comparisons
total comparisons: 34
BottomUpMaxHeap
construct a heap with cdargswhk: 12 comparisons
pop w: 4 comparisons
pop s: 3 comparisons
pop r: 3 comparisons
pop k: 3 comparisons
pop h: 3 comparisons
pop g: 2 comparisons
pop d: 1 comparisons
pop c: 0 comparisons
pop a: 0 comparisons
total comparisons to sort: 31

This rather modest overall improvement would be amplified for larger trees, but this example is still small enough for you to work through by hand.

Submission (Python)

Submission filename: maxheap.py (The Moodle tester has space to upload two files, to accommodate Java, but for Python you only need to upload a single file.)

The Python tester takes a functional approach to instrumenting your code. Please submit a source file maxheap.py that implements three functions as before, plus one more:

# similar to before
heapify(x, cmp)    # rearrange x into a heap
push(x, e, cmp)    # insert a new element
popmax(x, cmp)     # the implementation from Tick 1
# new function
popmax_bu(x, cmp)  # the bottom-up implementation

Each function takes an additional argument cmp. This is a comparison function, that compares two elements and returns -1, 0, or 1. The simple implementation of cmp is

def cmp_simple(x,y):
    return -1 if x<y else (1 if x>y else 0)

but the tester actually uses a cmp function that keeps track of the number of comparisons it is making.

You can define your functions so that they work in both Tick 1 and Tick 1* by first defining cmp_simple in your code, then defining def heapify(x, cmp=cmp_simple): .... This means “If the user passes in a value for cmp then use that, otherwise default to cmp_simple”.

In the standard Python library, it’s common to use helper functions for this sort of customization. For example, to sort a list use sorted(x), and to sort it using a custom key function use sorted(x, key=mykey).

Submission (Java)

Submission filenames: MaxHeap.java and BottomUpMaxHeap.java

The Java tester takes an object-oriented approach to instrumenting your code. The first file to submit, MaxHeap.java, should be similar to your MaxCharHeap, but you should modify it so that it can make a max-heap of anything that implements Java’s Comparable interface. Your implementation should have a constructor with the signature

public MaxHeap(List<T> x)

The second file to submit, BottomUpMaxHeap.java, should have a constructor

public BottomUpMaxHeap(List<T> x)

Both of them should implement the interface uk.ac.cam.cl.tester.Algorithms.MaxHeapInterface:

package uk.ac.cam.cl.tester.Algorithms;

public interface MaxHeapInterface< T extends Comparable<T> > {
       public void insert(T e);
       public T popMax() throws EmptyHeapException;
       public int getLength();
}

Here is some sample code that should be supported:

List<String> input = new ArrayList<String>();
input.add("Alpha");  
input.add("Omega");

MaxHeap<String> mh = new MaxHeap<String>(input);
mh.getMax(); // "Omega"
mh.getMax(); // "Alpha"
mh.getMax(); // EmptyHeapException

MaxHeapInterface constrains the type parameter (T) to types implementing Comparable. Every comparison is associated with a call t1.compareTo(t2) on objects of type T. The tester uses a class uk.ac.cam.cl.tester.Algorithms.ComparisonCountingString which is like a string, but instrumented to count the number of calls to compareTo(). Look at the code for ComparisonCountingString so you understand how it works. Use it to check that your code does the expected number of comparisons for some examples. (For further diagnostics, you might like to extend ComparisonPrintingString to print out details of each comparison as it is made, e.g. “A compared to B”.)

Note: You won’t be able to use a raw T[] array in place of the char[] array (why not?) so you’ll need to use some other form of storage (you can hack around this using arrays of Objects but it’s not good practice: please do it properly).