Computer Laboratory: Diarmuid Ó Séaghdha's MPhil Project Suggestions 2012-13

Diarmuid Ó Séaghdha

MPhil Project Suggestions 2012-13

Update 23/11/12: I am not taking any more students for this year - sorry!

The Language of Influence in Social Media (with Daniele Quercia) Update: This project is now taken

While macro-level studies of Twitter have concluded that "Twitter is not a social network but a news media" (Kwak et al., 2010), micro-level analyses of users and user-user interactions suggest that the Twittersphere functions as a set of real communities according to well-established sociological definitions of community (Cheng et al., 2011; Quercia et al., 2011). Twitter facilitates conversations between users, each of whom has a particular degree of status online (which may or may not reflect real-life status). This project would study social power relationships on Twitter, with a focus on linguistic indicators. Notions of influence on Twitter have been studied before (Liu et al, 2010; Danescu-Niculescu-Mizil et al, 2011) but there are still many potential aspects to investigate. Recently, Bramsen et al. (2011) looked at identifying power hierarchies as manifested in email data; here the sociolinguistic motivation would be similar but the problem would be quite different (and novel!). The NLP techniques involved would start with basic lexical, topic-model and sentiment analysis, but could extend to sophisticated dialogue act modelling (Ritter et al., 2010) and discourse effects such as accommodation (Danescu-Niculescu-Mizil et al, 2011; Danescu-Niculescu-Mizil et al, 2012) and coherence.

Philip Bramsen, Martha Escobar-Molano, Ami Patel and Rafael Alonso. 2011. Extracting Social Power Relationships from Natural Language. In Proceedings of ACL 2011.
Justin Cheng, Daniel Romero, Brendan Meeder and Jon Kleinberg. 2011. Predicting Reciprocity in Social Networks. In Proceedings of SocialCom 2011.
Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. 2011. Mark My Words! Linguistic Style Accommodation in Social Media. In Proceedings of WWW 2011.
Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. In Proceedings of WWW 2012.
Haewoon Kwak, Changhyun Lee, Hosung Park and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of WWW 2010.
Lu Liu, Jie Tang, Jiawei Han, Meng, Jiang and Shiqiang Yang. 2010. Mining Topic-level Influence in Heterogeneous Networks. In Proceedings of CIKM 2010.
Daniele Quercia, Jonathan Ellis, Licia Capra and Jon Crowcroft. 2011. In the Mood for Being Influential on Twitter. In Proceedings of SocialCom 2011.
Alan Ritter, Colin Cherry and Bill Dolan. 2010. Unsupervised Modeling of Twitter Conversations. In Proceedings of NAACL 2010.

Multilingual models for lexical semantics

Lexical semantics is the study of word meaning; NLP research in this area typically deals with learning about concepts and how they interact through statistical analysis of text. The goal of this project is to use multilingual models that learn semantics from multiple languages at once, exploiting regularities that cross languages. One potential area of research for the project is selectional preference learning.

Selectional preference learning is a form of lexical acquisition where the aim is to model typical relationships between predicates and arguments; for example, the direct object slot of the verb "eat" has a preference for words like "pizza" and "stew" (the semantic class of foodstuffs), while the subject slot of "eat" has a preference for animate beings such as "man" and "dog". Selectional preference learning has a long history in NLP; recently, probabilistic topic models have been proposed as a powerful modelling framework, giving state-of-the-art results on a variety of tasks (Ó Séaghdha 2010, Ó Séaghdha and Korhonen 2011). The goal of this project is to extend prior work to peform joint modelling of selectional preferences in multiple languages, with the optimistic goal of improving overall model quality and the realistic goal of compensating in languages where data may be more sparse than in English. There is a large body of work on multilingual modelling for tasks such as part-of-speech learning (for example Snyder et al., 2008); prior work on multilingual semantic modelling is less common, but Peirsman and Padó (2010) have shown that multilingual selectional preference learning is possible. The goal of this project would be to build multilingual topic models for preference learning, possibly using frameworks similar to Mimno et al. (2009) or Boyd-Graber and Blei (2009). In addition to intrinsic measurements of model quality, it may be interesting to evaluate using the dataset from the SemEval-2 Cross-Lingual Lexical Substitution Task (Mihalcea et al., 2010). This project should be of interest to students who wish to develop new machine learning models for application to cutting-edge problems in lexical semantics.

Jordan Boyd-Graber and David Blei. 2009. Multilingual Topic Models for Unaligned Text. In Proceedings of UAI 2009.
Rada Mihalcea, Ravi Sinha and Diana McCarthy. 2010. SemEval-2010 Task 2: Cross-lingual lexical substitution. In Proceedings of SemEval 2010.
David Mimno, Hanna M. Wallach, Jason Naradowsky, David A. Smith and Andrew McCallum. 2009. Polylingual Topic Models. In Proceedings of EMNLP 2009.
Diarmuid Ó Séaghdha. 2010. Latent variable models of selectional preference. In Proceedings of ACL 2010.
Diarmuid Ó Séaghdha and Anna Korhonen. 2011. Probabilistic models of similarity in syntactic context. In Proceedings of EMNLP 2011.
Yves Peirsman and Sebastian Padó. 2010. Cross-lingual Induction of Selectional Preferences with Bilingual Vector Spaces. In Proceedings of NAACL 2010.
Benjamin Snyder, Tahira Naseem, Jacob Eisenstein and Regina Barzilay. 2008. Unsupervised Multilingual Learning for POS Tagging. In Proceedings of EMNLP 2008.

Identifying Causes and Effects

Reasoning about the causal links between events is a fundamental task in general artificial intelligence and in natural language understanding. As part of the SemEval 2012 exercise in semantic evaluation, Gordon et al. (2012) presented the Choice of Plausible Alternatives (COPA) shared task in which systems must select either the most likely of two possible outcomes caused by a trigger event or the most likely to two possible triggers leading to an outcome event. An example item is the following:

Premise: The man lost his balance on the ladder. What happened as a result?
Alternative 1: He fell off the ladder.
Alternative 2: He climbed up the ladder.

Only one system participated in the SemEval competition (Goodwin et al., 2012), so there is plenty left to do with this dataset! The project would involve comparing a number of methods in a controlled fashion, potentially including event chains in the manner of Chambers and Jurafsky (2008), insights from the psychology of causal reasoning (Griffiths and Tenenbaum, 2005) and supervised models of causality trained on data annotated for another SemEval shared task by Hendrickx et al. (2010).

Nathanael Chambers and Dan Jurafsky. 2008. Unsupervised Learning of Narrative Event Chains. In Proceedings of ACL 2008.
Travis Goodwin, Bryan Rink, Kirk Roberts, Sanda Harabagiu. 2012. UTDHLT: COPACETIC System for Choosing Plausible Alternatives. In Proceedings of SemEval 2012.
Andrew S. Gordon, Zornitsa Kozareva and Melissa Roemmle. 2012. SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning. In Proceedings of SemEval 2012.
Thomas L. Griffiths and Joshua B. Tenenbaum. 2005. Structure and strength in causal induction. Cognitive Psychology 51(4):334-384.
Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano and Stan Szpakowicz. 2010. SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals. In Proceedings of SemEval 2010.