Proposer: Andreas Vlachos
Supervisor: Andreas Vlachos, Stephen Clark
Special Resources: None
Biomedical event extraction is the task of extracting specific types of information about proteins. For example, from the following passage:
"TRADD was the only protein that interacted with wild-type TES2 and not with isoleucine-mutated TES2."
the following events should be extracted:
E1 Binding(Theme:"TRADD", Theme:"wild-type TES2")
E2 Binding(Theme:"TRADD", Theme:"isoleucine-mutated TES2")
Similarly, from this passage:
"In this study we hypothesized that the phosphorylation of TRAF2 inhibits binding to the CD40."
the following events should be extracted:
E1 Phosphorylation(Theme:"TRAF2")
E2 Binding(Theme:"TRAF2", Theme:"CD40")
E3 Negative_regulation(Theme:E2, Cause:E1)
(More information on event extraction can be found at the BioNLP 2011 shared task [1] website: https://sites.google.com/site/bionlpst/)
Note that in the first passage above event E2 is negated, and in the second one event E3 is speculated upon. While this information is of importance to the users of event extraction systems, most state-of-the art systems are unable to provide it. The task itself is rarely attempted (only two participants in the BioNLP2011 shared task) as it is quite challening: the information that needs to extracted is fine-grained at the level of events and the annotated data we are provided with do not contain annotation at the lexical level, i.e. we do not know which words result in an event being characterized as speculative or negated (sometimes referred to as negation and speculation cues).
Following analysis of the errors made, we will try to address them using a machine learning-based method. A baseline approach would be to represent event context using appropriate features and learn a classifier. However, such an approach is might to not work well due to sparsity issues. A more interesting way is to think of the task in terms of structured prediction, in which we first detect negation and speculation cues and then identify the event characterized by them. As discussed above, this level of annotation is unavailable. Therefore we will experiment with the search-based structured prediction framework [4] which can handle such issues and we have used successfully in order to build the event extraction system of Vlachos and Craven [2].