Invited Speakers

Dr. Stephen Clark (Oxford University)
Linguistically-Motivated Large-Scale Language Processing
Parsing of natural language has reached a point where robust, efficient, accurate and linguistically sophisticated parsers now exist. In this talk I will describe such a parser based on the linguistic formalism Combinatory Categorial Grammar (CCG). CCG was designed to handle the long-range dependencies inherent in constructions such as relativisation and coordination, and the parser recovers these dependencies. As well as describing the linguistic formalism, I will also describe the Perceptron model used by the parser to select the most likely parse. The Perceptron is trivial to train but leads to good results. As well as being accurate, the parser is surprisingly efficient: it is capable of analysing 1 billion words of text in less than 5 days using only 18 computers. I will describe the properties of CCG which make such parsing speeds possible.

Dr. Mark Craven (University of Wisconsin)
What Do These Genes Have in Common? The Role of NLP in Understanding High-Throughput Biological Experiments
Ten years ago, the work of the typical biologist was focused on studying one gene, one protein, or some other isolated part of a cellular system. The typical biologist today, in contrast, routinely conducts experiments that simultaneously assay thousands of genes, proteins or other molecules. This paradigm shift has presented a new challenge to biologists: how can they comprehend and gain insight from the results of experiments that characterize hundreds or thousands of molecules? A key resource that can be exploited to aid in this task is the scientific literature. The typical biologist may know a lot about a few dozen genes in a given organism, but very little about the thousands of other genes in the organism. The scientific literature, on the other hand, represents the collective knowledge about all of the genes in a given genome. I will discuss the important role that natural language processing methods can play in addressing the task of annotating high-throughput experiments. In particular, I will describe the challenges that arise in the task, and some of the types of NLP methods and systems that have been applied to it.