Dr. Mark Craven (University of Wisconsin)
What Do These Genes Have in Common? The Role of NLP in Understanding High-Throughput Biological Experiments
Ten years ago, the work of the typical biologist was focused on
studying one gene, one protein, or some other isolated part of a
cellular system. The typical biologist today, in contrast, routinely
conducts experiments that simultaneously assay thousands of genes,
proteins or other molecules. This paradigm shift has presented a new
challenge to biologists: how can they comprehend and gain insight from
the results of experiments that characterize hundreds or thousands of
molecules? A key resource that can be exploited to aid in this task
is the scientific literature. The typical biologist may know a lot
about a few dozen genes in a given organism, but very little about the
thousands of other genes in the organism. The scientific literature,
on the other hand, represents the collective knowledge about all of the
genes in a given genome. I will discuss the important role that
natural language processing methods can play in addressing the task of
annotating high-throughput experiments. In particular, I will
describe the challenges that arise in the task, and some of the types
of NLP methods and systems that have been applied to it.