Computer Laboratory

Computer Vision

The following exercises aim to give you hands-on experience with some common computer vision algorithms. OpenCV is perhaps the most popular library for computer vision tasks, as it provides optimised implementations of many common computer vision algorithms. This work was created by Christian Richardt and Tadas Baltrušaitis in spring 2012, and reworked by Lech Świrski and Tadas Baltrušaitis in 2013.

Getting started

Unfortunately, it is not trivial to install OpenCV and the first few steps are always quite tricky – particularly using the C or C++ interfaces. However, the new Python bindings (introduced in OpenCV 2.3) are more accessible and thus used in the following exercises. To spare you the installation effort, we have prepared a fully-functional virtual machine based on Ubuntu. The virtual machine (815 MB zipped, ~3 GB unzipped) can be opened with the free VirtualBox software on most operating systems (tested on Windows 7 and Mac OS X 10.7).

On the virtual machine, you can use gedit (Text Editor) for editing your Python scripts with syntax highlighting, and either the provided run scripts or python {filename}.py in a terminal to run your code. You can also experiment with the interactive Python shell ipython, which can run scripts using the %run script.py command. The four exercises are loaded onto the virtual machine, and should be visible on the Desktop - however, as we occasionally find bugs or issues with the code, we have also provided an update script which downloads the latest version of the exercises, so make sure to run it every time you attempt the exercises.

We encourage you to use the provided virtual machine for solving the exercises. However, if you really know what you are doing, you are welcome to install OpenCV (2.4.3 or higher) with Python bindings yourself - although be warned that this is at your own risk. In this case, the exercises are available on github: https://github.com/LeszekSwirski/cam-cl-computervision.

The OpenCV and NumPy documentations will come in handy while solving the exercises. Please report any issues with the VM or the exercises to Tadas Baltrušaitis or Lech Świrski.

1. Convolution

The exercise1.py script in the Exercise1 folder provides function stubs for the four exercises below, which it feeds with a test image and convolution kernel. The expected result images can be found in the answer-images folder

  1. Implement basic convolution by translating the C code on page 26 of the lecture notes to Python. The function basic_convolution should return the result as an image with the same size and datatype as the input image. You can assume that kernels are normalised, so the term /(mend*nend) should be left out. Please also include a progress indicator that prints to the console iff verbose is true.

  2. The result of the previous exercise will have black areas where there was no image information. Also, the result doesn't appear centred, as the kernel is not centred around 0 (but instead starts at 0). This is technically correct, but produces an undesirable result.

    To fill the unknown areas, we can extrapolate the image values. Using the cv2.copyMakeBorder function, expand your image to fill these areas, and perform the convolution on this larger image. Then, return a 'centred' image which is the same size as the original input.

    HINT: Consider how large the invalid border areas are. Why are they this size? Make sure to expand the image by the right size.

    HINT: In cv2.copyMakeBorder, you should use cv2.BORDER_WRAP as the value of borderType.

  3. Now apply the convolution theorem to speed up convolution.

    HINT: Both OpenCV and numpy have DFT implementations. These return their results in different, incompatible formats, so be careful not to mix them. Also, be careful how you multiply the DFT results, you cannot use numpy multiplication on OpenCV DFT results.

  4. Finally extend your previous implementation to produce the same result as exercise 2.

2. Edge detection

The exercise2.py script in the Exercise2 folder provides function stubs for the helper functions you will need for the exercises, together with the code for loading and storing required images. The input image is already smoothed and subsampled using pyrDown function, this is to supress some of the noise. Be careful about the input and output types of the library functions you will use as some of them expect floating point numbers at 32 or 64 bit precision whilst others take in 8 bit integers, be sure to read their documentation carefully.

    1. Using convolution with appropriate kernels calculate the approximations of the first and second-order partial derivatives:

      \frac{\partial f}{\partial x},\frac{\partial f}{\partial y}

      the magnitude of the Gradient:

      \sqrt{\left(\frac{\partial f}{\partial x}\right)^2 + \left(\frac{\partial f}{\partial y}\right)^2}

      and the Laplacian (∇2)

      of the input image. A stub function for this exercise is ComputeGradients.

      Your result should look like:

    2. Extract the edges from the first-order partial derivatives and the gradient magnitude images using amplitude thresholding. A stub function for this exercise is ComputeEdges. For this you can use the OpenCV function threshold (note that the function expects the CV_32F type). As a threshold value use 0.075.

      Your result should look like:

    3. Locating zero-crossing in a 2D image is non-trivial, so instead we will use the Canny edge detector. Experiment with the OpenCV Canny edge detection function with various threshold values. A stub function for this exercise is ComputeCanny.

  1. Create a scale-space representation of the input image with sigma values of 1, 1.6, and 2.56. A stub function for this exercise is scaleSpaceEdges, which will use your previously completed ComputeGradients and ComputeEdges functions to calculate the edges of the resulting images. You will find the OpenCV function GaussianBlur useful. Your result should look like:

  2. Using the previously calculated Gaussians, compute the Differences of Gaussians:

    G_{\sigma=1.6} \ast I - G_{\sigma=1} \ast I and G_{\sigma=2.56} \ast I - G_{\sigma=1.6} \ast I.

    Next, calculate the Laplacians of the Gaussians with sigmas 1 and 1.6:

    \nabla^2 (G_{\sigma=1} \ast I) and \nabla^2 (G_{\sigma=1.6} \ast I).

    week

    How is the Difference of Gaussians related to the Laplacian of Gaussian? You might find the OpenCV function Laplacian useful.

    Your result should look like:

  3. week

3. Panorama stitching

In these exercises, you will build your own basic panorama stitcher which will align and combine two images into one panorama (more details on this process). This is achieved by extracting features in both images, finding corresponding features between the two images to calculate the transform between images and finally warping them to combine them into one image.

To get started, open the exercise3.py script in the Exercise3 folder. The script provides functions for all the main steps involved in panorama stitching. The sections of the code you need to complete are indicated with ‘TODO’.

  1. The first step in panorama stitching is to extract features from both input images. Features are localised at a ‘keypoint’ and described by a ‘descriptor’. In the extract_features_and_descriptors function, use the detect_features function to detect SURF features in the image, followed by extract_descriptors to extract descriptors for these keypoints.

    Extra: If you want, play around with the definitions of detect_features and extract_descriptors. You can change the types of feature detector and descriptor extractor, or play around with their parameters (documented, however poorly, on the linked pages)

  2. The find_correspondences function finds features with similar descriptors using match_flann, which uses a kd-tree for efficient correspondence matching. match_flann returns a list of pairs, in which each pair gives the indices of corresponding features in the two images. Convert this into two arrays, points1 and points2, which give the coordinates of corresponding keypoints in corresponding rows. These correspondences are visualised in the file correspondences.jpg (created by the script) and which should look like this:

  3. Next, you need to work out the optimal image size and offset of the combined panorama. For this, you’ll need to consider the positions of each images’ corners after the images are aligned. This can be calculated using the homogeneous 2D transform (3×3-matrix) stored in ‘homography’. From this, you can calculate the total size of the panorama as well as the offset of the first image relative to the top-left corner of the stitched panorama. If you get stuck, please skip this exercise and move to the next one.

  4. Now combine the two images into one panorama using the warpPerspective function. On top of the panorama, please draw the features that were used to estimate the transform between the images using drawChessboardCorners (with patternWasFound=False). The resultingpanorama.jpg image should look like this:

4. Naïve Bayes for machine learning

In this exercise, you will use a Normal Bayes Classifier to build your own optical character recognition (OCR) system. The classifier assumes that feature vectors from each class are normally distributed, although not necessarily independently (so covariances are used instead of simple variances). This is a slightly more advanced version of a Naive Bayes Classifier. During the training step, the classifier works out the means and covariances of the distribution for each class (in our case letters) in addition to the prior probabilities of each class. In the prediction step, the classifier uses the Bayes rule to work out the probability of the letter belonging to each of the classes (combining the likelihood with prior).

To get started, open the exercise4.py script in the Exercise4 folder. You will be using two data sets, taken from http://www.seas.upenn.edu/~taskar/ocr/. The smaller data set contains only 2 letters and the larger one containing 10 – a more challenging task. The data is split in such a way that two thirds of the images are used for training the classifier and the remaining third for testing it. The loading of data and spliting into training and test partitions is already done for you in the provided code.

You will be using an already cleaned up dataset containing 8×16 pixel binary images of letters. Examples of letters you would be recognising are a, b, and c. Every sample is loaded into an array trainSamples(2|10) and testSamples(2|10), with every row representing an image (layed out in memory row-by-row, i.e. 16×8=128). This results in a 128-dimensional feature vector of pixel values. The ground truth (actual letters represented by the pixels) are in trainResponses(2|10), testResponses(2|10), for each of the data sets.

The sections of the code you need to complete are indicated with ‘TODO’.

  1. The first step in supervised machine learning is the training of a classifier. It can then be used to label unseen data. For this you will need to finish the CallNaiveBayes method. Use the NormalBayesClassifier class for this. The training, response and testing data are already layed out in the format expected by the classifier.

  2. Once we have a classifier, it is important to understand how good it is at the task. One way to do this for multi-way classifiers is by looking at the confusion matrix. The confusion matrix lets you know what object are being misclassified as each other, and gives you ideas of what additional features to add. Another useful evaluation metric is the average F1 score for each of the classes. These have already been written for you, and the code will report these statistics for the classifier you use. You can see that the performance on a-vs-b classification is much better than that of 10 letters - you should expect an F1 value of around 0.97 for the a-b case, and around 0.75 in the 10 letter case.

  3. Currently we are using the pixel values directly for the character recognition. This means that we are dealing with a 128-dimensional feature vector. However, our training set is not very large, and some letters having very few training samples; for example, 'j' only has 126 samples. This means that the training doesn't have enough data to figure out which pixels are important, and you will notice in your confusion matrix that 'j' is very poorly detected.

    We can improve on this by restricting the training to only look at interesting pixels. Principal Component Analysis, also known as the Karhunen-Loeve Transform, can be used to reduce the number of dimensions that we are looking at (by only looking at the dominant eigenvectors). In the ReduceDimensionality function, use the OpenCV PCACompute function on the training data to determine the data's eigenvectors. Then, use PCAProject to project both the training data and the testing data onto these eigenvectors. Now, reduce the dimensionality of the data by picking only the top ~30 eigenvectors. You should expect a noticeable improvement in the classification rate - up to around 0.98 for a-b, and 0.83 for a-j. Play around with different amount of eigenvectors to see the effect it has on the result.