Algorithms Tick 2

Algorithms Tick 2: dynamic programming

Bottom-Up Longest Common Subsequence

In this tick you will implement a bottom-up dynamic programming algorithm to find the Longest Common Subsequence (LCS) of two strings. For example, given “xaxbxcxd” and “aabbccdd” the LCS is “abcd”. For many applications the actual LCS (“abcd”) is not important: rather it is the length we want (4 in this case). To that end we usually formulate the problem around finding the lengths of the LCS subproblems.

To solve the LCS problem with dynamic programming we need to express the optimal solution in terms of optimal solutions to smaller problems with the same structure. The length of the LCS of two strings s1 and s2 is one of the following:

To solve this bottom-up, we build a table of optimal solutions to subproblems, starting from the smallest and going all the way to the ultimate solution. It is helpful to build the table in the form of a two-dimensional matrix such that cell [i,j] gives an optimal LCS length for s1-up-to-the-\(i\)th-character and s2-up-to-the-\(j\)th-character.

Here is an example LCS length table for the inputs “ABBA” and “CACA”.

\(j=0\) (C) \(j=1\) (CA) \(j=2\) (CAC) \(j=3\) (CACA)
\(i=0\) (A) 0 1 1 1
\(i=1\) (AB) 0 1 1 1
\(i=2\) (ABB) 0 1 1 1
\(i=3\) (ABBA) 0 1 1 2

The final LCS length is given by the value in the bottom right, 2 in this case. If we did want the actual LCS (“AA” in this example) we must start at the bottom right and work back through the array. Exactly how to do this is left as an exercise.


You may submit your answer on Moodle in either Python or Java.

Submission (Python)

Submission filename: lcs.py

Please submit a source file lcs.py that implements three functions:

# returns the full table of LCS lengths, or [] if either s1 or s2 is empty
table(s1, s2)

# return the final LCS length and string, given  tbl = table(s1, s2)
match_length(tbl)
match_string(s1, s2, tbl)

The table should be stored as a list of lists. You can initialize it with

tbl = [[None for j in range(len(s2))] for i in range(len(s1))]

so that the entry for \(i\) and \(j\) is accessed as tbl[i][j].

Submission (Java)

Submission filename: LCSBottomUp.java

Please submit a class LCSBottomUp that extends the class uk.ac.cam.cl.tester.Algorithms.LCSFinder:

package uk.ac.cam.cl.tester.Algorithms;

public abstract class LCSFinder {
    protected final String mString1;
    protected final String mString2;
    protected int mTable[][];

    public LCSFinder(String s1, String s2)  { mString1=s1; mString2=s2; }
    public abstract int getLCSLength();
    public abstract String getLCSString();
    public int[][] getLCSLengthTable()  { return mTable;}
    }

Please note that your code must be in your own package, named after your CRSID, for example uk.ac.cam.spqr1.Algorithms.Tick2. You will therefore have to import LCSFinder from uk.ac.cam.cl.tester.Algorithms. The tester will use its own copy of that file.

You should implement getLCSLength() to compute the full table of LCS lengths and return the final LCS length. Note that this function should not compute the actual LCS string. The table of length solutions should be stored in the mTable array.

Your should also implement getLCSString() to use the table of solutions in mTable to compute the actual LCS as a string.

If either input string is empty, your solution should return zero for getLCSLength() and "" for getLCSString(), and mTable should be null.

You may have noticed that the approach taken here is an adapted Strategy pattern. An alternative in this case would have been to make the getLCSLength()method static since the need to create an LCSFinder object in order to compute the LCS is a bit ugly. However, we use it here for two reasons: providing an abstract class that must be extended makes automated testing easier; and tick 2* builds on this code with alternative strategies so it makes sense to exploit polymorphism rather than have shadowed static methods.