Ann Copestake and Marina Terkourafi.
Conventional speech act formulae: from corpus findings to formalization.
Constraints in Discourse, NUI Maynooth, Ireland. 2006.
Ann Copestake and Marina Terkourafi. Conventional speech act formulae in HPSG. International Conference on Head-Driven Phrase Structure Grammar, Varna, Bulgaria. 2006.
The BA project was envisaged as a preliminary study that might lead to a larger scale project on modelling politeness. However, during its course, it became apparent that there were many aspects of the work that we had not foreseen and that it had much wider ramifications than we had envisaged. Our initial starting point for collaboration was our shared interest in conventionality in language, but establishing common ground was not straightforward and there was very little prior research to draw on. Working on politeness in HPSG required us to consider how current pragmatics research in general could interact with formal syntax and semantics, which in turn has led to questions of differences in methodology and formalism which we believe are highly significant. The two research areas make such different assumptions that creating a real bridge between them is a significant challenge.
While the new project is not in itself computational, we think that computational linguistics may help provide the bridge between HPSG and pragmatics. Computational linguists often use a stochastic approach to determining dialogue acts / conversational moves. Stochastic/corpus-driven methods are now also incorporated within large linguistically-motivated HPSGs to handle ambiguity, and though the objectives for doing this are normally stated in engineering terms, we speculate that there is theoretical relevance too.
Terkourafi, Marina & Villavicencio, Aline (2003) 'Toward a formalisation of speech act functions of questions in conversation.' In: Bernardi, Rafaella and Michael Moortgat (ed.) Questions and Answers: Theoretical and applied perspectives. Utrecht Institute of Linguistics OTS. 108-119. Available online at: http://www-uilots.let.uu.nl/%7Ectl/workshops/CES03/On_line_Proceedings/Papers/terkourafi.pdf
Recent years have witnessed a steady growth in research on linguistic pragmatics, both from an intra-cultural and a cross-cultural perspective. Several small-scale studies have attempted to tap into native speakers' or learners' intuitions regarding the impact of gender, age, ethnicity, setting, and so on, on language use, producing a wealth of data from majority and minority languages, and from various social groupings within broader communities (see, e.g. recent issues of the International Journal of the Sociology of Language, and the Journal of Pragmatics). For the most part, these findings are descriptive and have had little impact on research conducted within various grammatical frameworks. Sign-based approaches, such as HPSG, aim to integrate pragmatics with other levels of analysis, but although some previous research (e.g., Green 1987, Paolillo 2000, Pollard and Sag 1994, Bender 2001, forthcoming) has noted points of contact between the two disciplines, it has not explored the precise nature of this interaction, or the full implications of addressing pragmatics within a lexicalist framework.
Pragmatic findings pose new questions for lexicalist formalisms. Accommodating these findings will enhance the scope of the analyses but may challenge their formal basis. Pragmatic analyses, on the other hand, need to move beyond description, and advance testable hypotheses. At the same time, they must develop rigorous methodologies facilitating cross-linguistic and cross-situational comparison of their findings, such that the large amounts of data collected empirically can be fully exploited.
By highlighting common concerns between the two disciplines, we aim to focus the questions they ask, as well as help formulate their results, in ways that enhance their cross-disciplinary applicability/relevance. We will refine the research questions of the previous section to clarify their interactions and specify possible alternative approaches.
In principle, quantitatively-motivated empirical pragmatic analyses and formally-oriented work within HPSG share several theoretical and methodological concerns. Encoding contextual features within the sign raises the question of the number and precise identity of the features needed to capture effects such as conventionalisation. However, we now believe that this cannot be settled independently of the general question of formalising pragmatic constraints in HPSG. Developing the pragmatic component of the framework involves first clarifying the nature of pragmatic features in HPSG, where they may be specified, what form their values might assume and how such values may be specified, how 'soft' the corresponding constraints may be, when/how they may be overridden, and how they interact with other dimensions (phonological, syntactic, semantic) of the sign during interpretation/production. Languages which grammaticise the consequences of pragmatic processes, such as Korean and Japanese, have constituted prime points of departure for investigating related issues (Pollard and Sag 1994, Siegel 2000, Engdahl, to appear). A theoretically-compelling account of the Cypriot-Greek data must also address the wider cross-linguistic questions.
These theoretical questions have to be related to methodology. What constitutes appropriate data for addressing pragmatic questions remains an open issue. Pragmatic intuitions are often more graded than syntactic and semantic ones, though not less real, as shown by native speakers' spontaneous judgements about (and, reactions to) the appropriateness of various utterances in different situations. While linguistic introspection is a resource whose value we acknowledge, we aim to use this only complementarily, i.e. to check the viability of the hypotheses we formulate based on the observable data.
Exploring the question of which methodological tools are appropriate for investigating speakers' communicative competence contributes to the ongoing debate about the nature and role of data in theoretical linguistics (e.g., Studies in Language 28:3; forthcoming special issue of Lingua). Lexicalist approaches, including HPSG, prioritise data from introspection, as this ensures comprehensive coverage of the (syntactico-semantic) phenomena investigated and of the range of grammatical alternatives. While comprehensiveness is certainly a concern for corpus-based analyses (including Terkourafi's (2002) data), using data from introspection is open to scrutiny as to its representativeness.
There is limited research on the interface of lexicalist grammars with pragmatics. Besides the HPSG work cited above, Asher and Lascarides (2003: 304ff) is relevant to our approach but is not empirically grounded. From a different perspective, some computational work (e.g., Jurafsky 2004; Carletta et al. 1997) has addressed the question of illocutionary act recognition (though different terminology is used) but does not relate illocutionary acts to general grammars. Stochastic work within HPSG exists, but is viewed as an engineering approach to cut down lexical and syntactic ambiguity and not as a formally interesting technique.
In the medium-term, probably as part of a follow-on project, we hope to provide results that might be incorporated into attempts to provide a common framework for implementing HPSG (the Grammar Matrix: Bender et al, 2002), which is distributed as Open Source. In the long-term, our research could have important implications for the development of computational dialogue systems, particularly in generating more natural dialogue.
To investigate to what extent existing resources can be used, we will experiment with a variety of English corpora (the Computer Laboratory has licences to the main ones, including those available via the Linguistic Data Consortium). To gain information on the full range of grammatical alternatives, we will conduct pilot studies on combining corpus-data with elicited experimental and introspective data via purpose-designed experiments enabling targeted data acquisition. In this way, we aim to contribute to developing alternative methodologies for collecting data about communicative competence.