Computer Laboratory

Yiannos A. Stathopoulos

I am a PhD student working on Mathematical Information Retrieval (MIR) under the supervision of Dr. Simone Teufel.





Code and Data Downloads

  • Download the Cambridge University MathIR Test Collection (for retrieval of research-level mathematics) described in
    "Retrieval of research-level mathematical information needs: A Test Collection and Technical Terminology Experiment"
  • Request the type dictionary (10601 phrases) and gold-standard data set for type detection from "Mathematical Information Retrieval Based on Type Embeddings and Query Expansion"
    Please drop me an e-mail for this data set.

Cool things I've built

This is a partial list of cool stuff I've built.

  • Mathalyzer -- an interactive tool for analysing mathematical formuale in PDF documents. Written in C++ and GTK+, this tool employs the Presentation-Abstraction-Control (PAC) pattern to synchronise multiple data elements in a unified presentation. The idea behind Mathalyzer is to produce a tool that combines elements of Acrobat, Photoshop and SPSS.

  • Spine -- A small C++ library, forked from the subsystems of Mathalyzer, that implements Presentation-Abstraction-Control (PAC) message passing with GTK+ controls. This library is used to synchronise the data-model of GUI apps, with various independent GUI elements implemented in GTK+.
  • Interval and range trees -- A small C++ library of interval and range trees for optimising the Mathalyzer canvas. My implementation of interval and range trees is built on top of Red-black trees. Upon rotation, the R-B tree implementation raises a rotation event. Event handlers at higher levels are responsible for applying transformations that re-establish the invariants of the interval and range trees.
  • OMEX -- Software that detects and extracts mathematical expressions from PDF. The pipeline is the subject of my paper with Dr. Brian Harrington. Mathalyzer was built to extend aspects of this pipeline with machine learning.
  • MapReduce in C++ -- I built a small C++ implementation of Google's MapReduce. The implementation is designed to abstract parallelisation of tasks using Mappers, Grouppers and Reducers on multi-core systems.