Computer Laboratory

COBRA - Content-Based Retrieval Architecture

COBRA - Content-Based Retrieval Architecture - is an attempt to build a framework to construct multimedia information retrieval systems.

Information retrieval is the task of locating documents in response to an information need. After over thirty years of research into text retrieval systems, recently techniques have been developed to allow access to multimedia documents, including text, graphics, audio and video. However, many of the systems which have handle media other than text are bespoke designs. Research in the Opera group aims to investigate methods of constructing IR applications which can exploit multimedia data in a uniform manner. In the same way that databases are implemented within a DBMS, so it is hoped that IR applications may be designed under Cobra.

In pursuing this goal, a number of issues have arisen. How can information from heterogeneous sources be fused in a meaningful way, such that the relevance of, say, a textual document can be compared with the relevance of an image in response to the same query? Furthermore, how can user interfaces be developed to ensure that suitable queries can be constructed, and that the results can be made easily accessible to the searcher?

As a first test of the Cobra system, a large collection of historical records is being made available on the World Wide Web, with search facilities constructed within the Cobra framework. The documents, taken from a study of the Essex village of Earls Colne, have been marked up to identify individuals within the collection, land plots and dates. Since two distinct people may have the same name, this leads to complications when search for a person by name.