This paper is also available as PDF (suitable for printing).

Extract from Blackwell, A.F. (1998). Metaphor in Diagrams
Unpublished PhD Thesis, University of Cambridge.

Chapter 2 - Diagram and Metaphor as Tools

As no image can be formed of abstract ideas, they are, of necessity, represented in our mind by particular, but variable ideas; and if an idea bear any relation to quantity of any kind, that is, if it admit of the modification of greater and less, though the archetype, as it is called, of that idea be nothing that is the object of our senses, it is nevertheless universally represented in our mind by the idea of some sensible thing.
A Description of a Set of Charts of Biography,
J. Priestley, 1804, p. 5.

This chapter reviews previous research that has investigated the application of both diagram and metaphor as cognitive tools. Much research into the use of diagrams has not considered the possibility that metaphor might be involved. Likewise, much research into metaphor has explored metaphor in language rather than in diagrams. The chapter is divided accordingly. After brief definitions of diagrams and of metaphor as subjects of psychological research, the bulk of the review considers how each can be studied as tools.

The section that discusses diagrams as tools considers general theories of external representation use in problem solving, then addresses two specific cases that have been studied in greater detail: graphs and visual programming languages. The section that discusses metaphor as a tool concentrates on the previous research in human-computer interaction that has motivated this study, as described in the introduction to chapter 1. It is this research that suggests a possible relationship between theories of metaphor and of diagram use, despite the fact that there is relatively little empirical evidence to support some of the main theories.


Although this project originated in the study of graphical user interfaces, the methods and conclusions are applicable to a broader class of cognitive artefact (Norman 1991, Payne 1992) - diagrams. Diagrams are familiarly associated with instruction manuals (Gombrich 1990), electronics (Newsham 1995, Petre & Green 1990), software design (Martin & McClure 1985), architecture (Porter 1979), geometry (Lindsay 1989, Netz in press), general mathematics education (Pimm 1995, Kaput 1995) and symbolic logic (Shin 1991, Sowa 1993) as well as informal problem-solving (Katona 1940). Insights from these various fields are slowly being integrated in the interdisciplinary study of Thinking with Diagrams (Glasgow, Narayanan & Chandrasekaran 1995, Blackwell Ed., 1997), with conclusions that are more widely applicable to other notations, including such examples as music notation (Bent 1980), board games (Ellington, Addinall & Percival 1982) or proposals for a pictographic Esperanto (Shalit & Boonzaier 1990).

Figure 1. Continuum of representational conventions in cognitive artefacts

Within this huge range of applicability, the common nature of diagrams is most appropriately defined by contradistinction. Diagrams form the middle part of a continuum between two other classes of cognitive artefact: text and pictures (see Figure 2.1). If we regard all three as markings (Ittelson 1996) on some surface (setting aside the tasks to which they might be applied), diagrams can be distinguished from text by the fact that some aspects of a diagram are not arbitrary but are homomorphic to the information they convey. They can be distinguished from pictures by the fact that some aspects must be interpreted by convention, and cannot be deduced from structural correspondences.

A simple distinction underestimates the complexity of text and pictures, however. The cognitive processing of text is closely related to auditory verbal comprehension, and therefore inherits homomorphic features of speech: onomatopoeia, for example (Werner & Kaplan 1963), as well as typographic conventions and conjectured systematic origin of all abstract verbal concepts in spatial experience (Jackendoff 1983, Johnson 1987, Lakoff 1987). The construction and interpretation of pictures also relies on some arbitrary depictive conventions (Willats 1990), even though those conventions may simply reflect basic perceptual abilities (Kennedy 1975) and have been supplemented by the mechanical determinism of photography (Ivins 1953). For the purposes of the current argument, text and pictures can be regarded as ideals - extremes that are never observed in actual communication via markings. Instead, all texts are to some extent diagrammatic, and all pictures are to some extent diagrammatic. Even a photograph, despite the implied objectivity of mechanical reproduction, conveys information diagrammatically through its composition, its context on a surface and other factors (Stroebel, Todd & Zakia 1980).

As diagrams share aspects of both text and pictures, they can be analysed using techniques and theories from either extreme of the continuum. Firstly, diagrams can be regarded as two-dimensional graphical languages, composed from a lexicon of geometric elements. The relationship between these elements can be described in terms of a syntax incorporating various subsets of proximity, ordering, spatial enclosure and topological connection. Interpretation of a diagram is therefore a process of deriving semantic intention from the syntactic relationships that have been created between the lexical elements (Bertin 1981). This view of diagrams suggests that researchers should use the structural analysis of Saussure (Culler 1976), or the semiotic trichotomies of Peirce (1932).

Alternatively, diagrams might be regarded primarily as representations of physical situations. If they communicate any abstract information, this would involve metaphorical reasoning, for example relating the "upward" direction on the page to an increase of some abstract quantity (Gattis & Holyoak 1996, Tversky, Kugelmass & Winter 1991). The individual elements of a diagram may also be actual pictures, in which case they might be interpreted metaphorically as representing abstract concepts (Barnard & Marcel 1978).


Is it justified to apply the word metaphor to diagrams? Metaphor is usually understood in a verbal context; specifically as a figurative literary device or trope. Like irony, hyperbole and other tropes, metaphor is identifiable by the fact that the literal meaning of the words is not the meaning intended. There is instead a figurative meaning, which the hearer must establish by deduction from the context of the utterance, from knowledge of the world, and by constructing theories regarding the speaker's intention. The cognitive resources involved in this interpretive process are sophisticated - children have difficulty in understanding both irony and metaphor (Winner & Gardner 1993).

Aristotle's Poetics accords great respect to the value of metaphor ("... by far the greatest thing is the use of metaphor. That alone cannot be learnt: it is the token of genius" xxii. 17), and contains a detailed analysis of the way that metaphor works:

It is the application of a strange term either transferred from the genus and applied to the species or from the species and applied to the genus, or from one species to another by means of analogy.
Aristotle, Poetics xxi. 7

Modern cognitive theories of metaphor have often emphasised only a single aspect of this analysis. Glucksberg and Keysar (1993), for example, emphasise that metaphors are expressed and understood as statements about class inclusion (i.e. genus and species), where the target of the metaphor inherits attributes from elsewhere in some categorical hierarchy. Gentner, Falkenhainer and Skorstad (1988), on the other hand, emphasise that understanding a metaphor is the same as drawing an analogy - it involves the mapping of structure and attributes from one domain to another.

A third cognitive theory of metaphor emphasises the metaphors that have "fossilised" into idiom. Lakoff and Johnson (1980) claim that individual idioms can be related to systematic collections of metaphorical concepts. For example, when Aristotle describes Empedocles" use of the metaphor "the evening of life" (Poetics, xxi. 13), Lakoff and Johnson might observe that there are many other idioms relating stages of life to time of day, and that these reflect an underlying conceptual metaphor such as "life is a day". Johnson (1987) and Jackendoff (1983) have both proposed theories in which all abstract language must be derived from embodied physical experience. Johnson describes this process as metaphorical, but Jackendoff objects (1983 p. 209) that the equation of physical analogy with metaphor is facile. The necessary grounding of abstraction in physical experience is a view that Black (1993) attributes first to Carlyle. It is supported by Lakoff and Johnson's conceptual metaphor proposal, and by Gentner and Wolff's (1997) "career of metaphor" hypothesis, but these are vigorously debated in cognitive psychology; Murphy (1997), for example, claims that Lakoff and Johnson's collection of metaphors involving the vertical direction simply neglects the polysemous multiple meanings of the word "up", while Gibbs (1996) defends conceptual metaphor from a review of experimental investigations of idiom comprehension.

There are numerous other theories of metaphor interpretation, some of which are supported by experimental evidence. Chomsky's anomaly model of metaphor processing, for example, suggests that we first evaluate the literal meaning of the metaphor, then reject that as a result of identifying an anomaly. Pynte et. al. (1996) studied the time-course of metaphor comprehension, and found evidence from event-related potential observations that the literal meaning of a metaphoric phrase was indeed evaluated before the figurative meaning. Tourangeau and Sternberg's interaction view of metaphor (1982) claims that aptness is increased by semantic separation between the source and target domains of the metaphor, because an apt metaphor must involve reorganising the hearer's understanding of the target domain. These and other theories of metaphor are less commonly investigated in cognitive psychology, and to my knowledge have never been applied either to diagrams or to HCI. They are not considered any further here.

This discussion provides several alternative models for addressing the role of metaphor in diagrams. If Gentner's structure mapping theory of analogy (1983) is also involved in processing metaphor, it might be better to describe diagrams as analogies rather than metaphors. The value of diagrams in solving problems of structural analogy has certainly been demonstrated (Beveridge & Parkins 1987). If this is the only sense in which diagrams are metaphorical, they can be described in terms of structural geometric properties, rather than requiring any consideration of pictorial depiction. Alternatively, if diagrams are interpreted in terms of their resemblance to physical objects and situations, they should be analysed in terms of class inclusion. If this is the case, there is perhaps a more appropriate term applied in the visual arts. A painting in which the elements represent abstract concepts in the class they belong to is described as an allegory rather than a metaphor.

Is there any good reason why we should describe diagrams as metaphors rather than as structural analogies or pictorial allegories? There are three reasons why it is convenient to do so. Firstly, the field of HCI has adopted the term metaphor, while being unaware of many of the cognitive theories described above (although Gentner, Falkenhainer and Skorstad (1988), explicitly reject the suggestion that their model of metaphor applies to user interfaces, and Jackendoff (1983) insists that metaphor is more complex and subtle than physical analogy). Secondly, there is also a small existing literature outside the fields of psychology and HCI that has described the interpretation of diagrams as a process of metaphor: in education (Goldsmith 1984) in graphic design (Richards 1997) and in comic book art (Kennedy, Green & Vervaeke 1993). Thirdly, theories of conceptual metaphor have been explicitly extended from language to diagrams (Lakoff 1993). Some interpretations of conceptual metaphor claim that even linguistic metaphors are interpreted with the aid of mental images. Gibbs and O"Brien (1990) found that subjects were able to report causal relationships from images formed when interpreting a metaphor, although Cacciari & Glucksberg (1995) reported that identification of paraphrased metaphors was slower when such images were formed. The implied relationship between diagram use and these theories of metaphor interpretation is reviewed in more detail in chapter 5.

Diagrams as Tools

This thesis considers three broad categories of cognitive task in which diagrams are applied as tools. They are often used for communicating information, both as isolated presentations (e.g. statistical graphs) and as instructional material supporting a text (e.g. textbook illustrations). Secondly, they are used during problem solving, as external representations that supplement working memory and efficiently express problem constraints. Thirdly, they are used as an aid to discovery, generating potential configurations and exploring alternative solutions. This thesis emphasises instruction, for which relevant literature is reviewed in chapter 4, and discovery, for which relevant literature is reviewed in chapter 5. Most existing research into diagram use emphasises problem solving - that research is summarised in this section.

Diagrams as Tools in Problem Solving

Although diagrams may depict relationships in the real world, and may stimulate mental imagery, it is not necessary to assume any resemblance to visual scenes (Goodman 1969), or causal relationship to mental images (Scaife & Rogers 1996). Most theoretical treatments of diagram use simply consider their geometric structure, rather than the metaphorical possibilities discussed in this thesis. Larkin and Simon (1987) attributed the benefits of diagram use during problem solving to three main information-processing operations. Diagrams can express correspondences between elements without requiring that labels be defined for those elements. Secondly, they can group together information that will be needed at the same time, thus reducing the amount of search required during problem solving. Thirdly, they support "perceptual inferences" by which information can be read directly from the diagram.

Bauer and Johnson-Laird (1993) have extended Larkin and Simon's analysis of geometric correspondences in diagrams. They demonstrated that subjects were faster and more accurate when answering a question based on a two-branch electrical circuit diagram than when answering a logically equivalent verbal question involving double disjunction. The geometric strategy used by subjects in this experiment is even more straightforward than that modeled by Larkin and Simon: subjects could use the diagram to answer the question simply by tracing (or imagining tracing) the lines of the circuit with a finger. Green (1982) has however noted the restrictions of this type of diagram - there are only a limited number of "mental fingers" that can be maintained when tracing flow through a complex diagram.

The perceptual inferences described by Larkin and Simon may simply involve low-level visual processing of boundaries (Ullman 1984) - either assessing three dimensional shape (Hoffman & Richards 1984, Grossberg 1997) or two dimensional figures (Palmer & Rock 1994, Shimaya 1997). They also enable impressive performance on computationally intensive tasks such the "travelling salesman" optimisation problem, for which MacGregor and Ormerod (1996) demonstrated that untrained experimental subjects could produce solutions that were more optimal than the best available computational algorithms. In the case of diagrams, Lindsay (1988) has demonstrated that perceptual processes make explicit information that was only implicit in an original construction. Lindsay also observes that this kind of reasoning with spatial representations avoids the frame problem - knowing which aspects of a situation remain unchanged as the result of some action - because the scope of action is defined by spatial locality. This advantage also underlies the benefits of "direct manipulation", to which I attributed the success of graphical user interfaces in chapter 1.

Zhang (1997) has proposed a cognitive model of diagrammatic representations in problem solving that integrates the computational aspects observed by Larkin and Simon. He contrasts the perceptual operations afforded by external representations with the internal representations that support cognitive operations, including the retrieval of information from memory. Both internal and external representations provide different means of a) looking ahead to simulate future problem states, b) applying learned knowledge, or c) acting on the basis of pre-existing biases that apply in a particular modality (Gestalt principles of perception, for example, are a perceptual bias which reveal certain properties in an external representation). The interaction between internal and external representations has also been expressed in a computational model described by Tabachnek-Schijf, Leonardo and Simon (1997). This model constructs lines on a simulated blackboard, then inspects the blackboard to notice emergent properties, such as places where lines intersect. The graphical information is stored in a memory array representing visual working memory, but is also related to propositional knowledge about the meaning of the lines. The latter is stored in an approximate (non-phonological) model of verbal working memory.

Expert problem solving, such as that studied by Tabachnek-Schijf and Leonardo, is characterised by a repertoire of different diagrams and other representations, each of which may facilitate a different range of tasks (Sloman 1995). Cox and Brna (1995) have demonstrated the importance of teaching students how to select an appropriate diagram or other representation, in addition to teaching the skills required to construct a diagram and read information off from it. Whether the choice is successful or not depends on the extent to which the diagram constrains the possible interpretations (Wang, Lee & Zeevat 1995, Stenning & Oberlander 1995). The analysis of information transfer between multiple representations requires a sophisticated theory of information, as well as experimental evidence, however. The Hyperproof system (Barwise & Etchemendy 1990), successfully used to teach propositional logic, models logical relations both algebraically and in an imaginary three-dimensional world. A formal description of the relationship between the model, the real world, and the symbolic system depends on very fundamental issues in philosophy of semantics (Barwise & Perry 1983).

Diagrams as Tools: The case of graphs

Graphs constitute a class of diagram so conventionalised that a graph can stand alone without explanatory text. Gillan (1995) has demonstrated that after simple training in graph use, subjects can successfully interpret complex arithmetic relationships that otherwise require complex problem solving. Graphs are so widely used by experimental psychologists themselves that they have perhaps attracted an undue degree of research attention. Detailed studies have been made of the semiotic (Kosslyn 1989) and perceptual properties of graphs (Hollands & Spence 1992, Spence 1990, Pisan 1995), as well as of interpretative behaviour (Zacks & Tversky 1997, Stone & Yates 1997, Carpenter & Shah 1998) and cross-cultural analyses (Tversky, Kugelmass & Winter 1991). Some of these studies have provided practical advice about when to use graphs in presenting research results (Carswell & Ramzy 1997, Shah & Carpenter 1995).

Applied research tends to focus on the question of what notation will be most suitable in cases where a choice can be made. In the case of graphs, this has been a focus of attention for many years. Washburne (1927) made a classic comparison of numerical data presented as graphs and tables, showing that graphs allow more rapid judgements. Meyer (1997) has recently reinvestigated Washburne's data, however, showing that his conclusions were unjustified - they have simply not been questioned because they were unsurprising. Similar problems pervade this type of research. Tufte's (1983, 1990) books on the design of quantitative graphs and other diagrams have been hugely influential in software design. They are not unequivocally supported by empirical evidence, however (Spence 1990, Zacks et. al. 1998). Tufte expresses various assumptions about readability and usability, but they amount largely to the personal (modernist) tastes of a practitioner. In recent years, these tastes are being supplanted by post-modern styles including pictures, tables and diagrams within the same frame (Wurman 1997). Although fashionable at the time of writing, post-modern information graphics have no more foundation in empirical research than Tufte's work. This is unlikely to prevent their increasing adaptation from American news media to applications in software design.

Figure 2. Examples of graphical presentation styles recommended by (a) Tufte and (b) Wurman.
[Sources: (a) from Tufte 1983, (b) from Wurman 1997]

Diagrams as Tools: The case of visual programming languages

This study originated in a commercial software product development project, designing a new visual programming language (Blackwell 1996d). Visual Programming Languages (VPLs) often resemble the diagrams used by computer programmers during the design process, but they are used directly by the computer - a VPL specifies program behaviour to a level of detail sufficient that the program can be executed with no further manual intervention. The development of VPLs can be traced to research by Sutherland (1963) and Smith (1977); the range of VPLs created since then has been surveyed by Myers (1986) and by Price, Baecker & Small (1993). Researchers generally draw a distinction between VPL research, and the range of programming environments marketed by Microsoft Corporation, including Visual Basic, Visual C++ and Visual Java. Although those products were presumably named to reflect the endeavours of VPL research, they differ from VPLs in that the program behaviour is specified using a conventional textual programming language rather than any kind of diagram.

Visual programming languages are an interesting topic of study in cognitive psychology, both because programming is a complex problem-solving activity, and because VPLs include a wide range of alternative diagrammatic representations (Blackwell, Whitley, Good & Petre, in press). Psychological research into the use of diagrams for programming predates the development of VPLs, in fact (Fitter & Green 1979). As commercial VPLs have become more widely available Green has, with various collaborators, published a substantial body of research investigating their cognitive implications (e.g. Green 1982, Gilmore & Green 1984a, Green, Petre & Bellamy 1991, Green & Petre 1992, Green & Blackwell 1996a).

Green's work has emphasised the nature of programming languages as information structures (Green 1990) - the question of whether the structure of the notation does or does not match the structure of the task is more important than the question of whether text or diagrams are used (Gilmore & Green 1984b). Green's analysis of information structures and the way they are used has been unified and extended in the Cognitive Dimensions of Notations framework (Green 1989, Green & Petre 1996). This approach to comparing the relative advantages of different programming languages is contrasted with the superlativist claims often associated with VPL research - that VPLs will be superior for all possible programming tasks (Green, Petre & Bellamy 1991). The contrast between empirical results and the superlativist position will be investigated in detail in chapter 3.

Diagrams as Tools: Empirical investigations

If graphs lie at one extreme of the diagrams that are studied experimentally, the other might be programming languages. Graphs are widely used, can often be interpreted independent of context or task, and might be considered a requirement of basic literacy. Programming languages, on the other hand, support complex problem solving and interaction between specialist users. Other diagram applications considered in human factors research, such as vehicle instrumentation or design of instructional material, generally fall between these extremes. Major themes in the empirical investigation of thinking with diagrams are often represented by experiments at each point along this continuum of complexity and context. As an example, Lohse (1997) has used gaze fixation analysis to identify the ways that layout conventions modify working memory requirements in graph interpretation. Chandler and Sweller (1996) have estimated working memory requirements (in the context of "cognitive load") that arise from the attempt to integrate text and diagrams in instructional material. Davies (1996) has investigated working memory requirements in programming by modifying the environment in which a program is written, thereby changing the extent to which experts can use the notation as an external representation to assist problem solving. A further example is the various investigations that have been made of structure in diagrams, and how it influences interpretation. Bennett and Flach (1992) have reviewed various perspectives on interpretative processes of information displays, such as Wickens and Carswell's (1995) proximity compatibility principle relating display location to function. Green's Cognitive Dimensions of Notations (Green 1989, Green & Petre 1996) describe the way that notational design can affect the tasks involved in constructing and modifying as well as interpreting programming languages and other notations.

Metaphor as a Tool

Metaphor is often thought of as a literary device; in the context of literature it is certainly a tool used deliberately to achieve specific effects. It is also intentionally applied as a tool to other communicative contexts - most notably to education. All education is a process of communicating new information to students in such a way that they can assimilate it and relate it to what they already know (Gallagher 1978). Metaphor is used by teachers to communicate novel concepts, but always brings the danger that students may over-extend the metaphor and draw inappropriate analogies (Nolder 1991). Ideally metaphors are used to develop new concepts by a process of triangulation (Petrie & Oshlag 1993) - students recognise anomalies between their existing knowledge and new information provided by the metaphor, and create new knowledge by correcting their model to accommodate both sources. Spiro et. al. (1989) propose that sophisticated students can be assisted in this process if they are given multiple metaphors, each correcting invalid extensions that might have been based on a single one.

Where the intention is to communicate purely abstract concepts however, it may be unreasonable to expect that pure abstractions can be derived from physical examples. Pimm (1995) observes that it is unhelpful to consider mathematical concepts as being independent of their representations, and describes the goal of mathematics education as learning to manipulate representations. If the goal is specific to the representation, then the use of physical metaphors (common in mathematics education) may even be detrimental to the goal of learning to do symbolic mathematics. In less abstract domains - physics for example - physical metaphors may of course help to form a simplified mental model of the situation being described. Mayer (1993) describes an experiment in which recall of physical principles was improved when radar operation was described metaphorically.

Metaphor as a Tool in Human-Computer Interaction

The application of metaphor to user interfaces can also be justified on educational grounds. The main obstacle associated with user interfaces is often described as a "learning curve" - the quotes from user interface textbooks in chapter 1 make it clear that metaphor is expected to remove this obstacle by allowing users to build on their experience from other areas. A secondary advantage of metaphor in HCI may lie in support for problem solving. When users experience problems with the device, they can solve those problems by analogy to solutions that might be applied in the metaphor domain. There is a substantial literature describing this analogical approach to problem-solving, based on various theories of analogy (Gick & Holyoak 1983, Gentner 1983, Holland et. al. 1986, Mitchell 1993, Keane 1997). The visual representations of a graphical user interface, besides introducing pictorial metaphor, can also help users to form appropriate analogies by matching the problem to the surface features of an appropriate source domain (Keane 1988, Heydenbluth & Hesse 1996). Beveridge and Parkins (1987) carried out an experiment in which subjects were more successful at forming analogies after seeing diagrammatic representations that depicted the required configuration. Schunn and Dunbar (1996) have claimed, in fact, that the value of analogy lies simply in priming of an appropriate solution - that no transfer of abstractions is involved.

In the HCI literature itself, justifications of metaphor in the user interface are usually made in terms of one of these two research perspectives; either the metaphor assists the user to learn the underlying abstractions of the computer system, or it provides a basis for problem-solving while performing a specific task. An early analysis by Carroll and Thomas (1982) said that the importance of metaphor implied a fundamental critique of the level at which psychology is applied to user interface design. Metaphor was an essential attribute of a good user interface, and this could only be appreciated in terms of psychological theories. Early textbooks and collections of readings on human computer interaction always included some representation of this view (Carroll & Mack 1985, Carroll, Mack & Kellogg 1988) and detailed cognitive models have been proposed as a framework for evaluating metaphorical interfaces (Rieman et. al. 1994).

Several attempts have been made to systematise the process of user interface design from metaphorical foundations. Carroll has provided several sets of guidelines for designers, in the texts listed above. Wozny (1989) advises the designer above all to make the metaphor explicit, so that it is accessible to the user. Madsen (1994) has written a practical "cookbook" instructing user interface designers on how to choose and apply a metaphor to their design. A European research project has defined the formal characteristics of usable metaphors (Smyth, Anderson & Alty 1995). A layered structure has also been proposed for the design of database user interfaces, in which the data model is situated at the bottom level of the hierarchy and the metaphor at the top level (Catarci, Costabile & Matera 1995).

There have also been critics of the metaphorical user interface. Halasz and Moran (1982) claimed that users need to develop an abstract conceptual model, and that metaphor was only of passing value in building that model (in the sense of Lakoff and Johnson (1980) - that all abstractions have some linguistic metaphorical basis). Halasz and Moran claimed that drawing new analogies from a user interface metaphor in order to solve problems was dangerous, because so many invalid conclusions might be derived. Simos and Blackwell (1998) have revised this argument in terms of Green's Cognitive Dimensions of Notations (Green 1989, Green & Petre 1996). As noted above, Gentner, Falkenhainer and Skorstad (1988) specifically discount the application of their structure mapping theory of metaphor and analogy to the analysis of user interfaces. Mohnkern (1997a) considers that a metaphor is useful only as a bundle of user interface affordances (Norman 1988) and that deeper systematic metaphors are likely to be misleading. It is certainly possible to find misguided applications of (mixed) metaphor, such as the (possibly disingenuous) observation by Akoumianakis and Stephanis (1997) that a pull-down menu is based on an underlying "restaurant" metaphor, or that the desktop metaphor is based on "sheets of paper called windows".

Empirical Investigations of Metaphor in HCI

The generally assumed theoretical benefits of user interface metaphor are supported by surprisingly little empirical evidence. Instead, one finds studies that appear to have set out with the goal of demonstrating the value of metaphor, but are eventually published with much weaker claims. Simpson and Pellegrino (1993), for example, carried out an experiment comparing a geographical metaphor of a file system to an unadorned flow chart. Despite participants" subjective preferences for the metaphor, no difference was observed in the performance of experts using either notation. Novices performed slightly better using the metaphor: the authors conclude only that direct comparison of the two forms is not justified because the tasks are not equivalent. It seems that a study which set out with the intention of demonstrating the benefit of metaphor failed to do so, and was published on other grounds.

Similar results are reported, with some surprise, by the human factors editor of IEEE Software. Potosnak (1988) reviews studies in which iconic interfaces performed poorly by comparison to command interfaces. She notes that these unexpected results are probably due to the fact that the iconic interfaces were poorly designed, and that the results do not necessarily cast doubt on the value of metaphor. Other studies have attributed unsatisfactory performance of metaphor to specific sub-groups within an experimental population; Rohr (1987) for example, reports complex interactions between personality characteristics and experimental task performance with graphical user interface metaphors. Those studies which have reported unambiguous benefits from metaphor use do not assume too much about the educational benefit of the metaphor. Schweiker and Muthig (1987), for example, describe the spatial metaphor as supporting "naive realism" - a concept apparently identical to direct manipulation. As mentioned at the start of this chapter, there is little doubt that direct manipulation is responsible for the success of graphical user interfaces; it is the more substantial claims about metaphor that give cause for doubt.

Origins of Metaphor in HCI

The idea that a user interface should be metaphorical is so widespread that it is dangerous to attribute it to a single source. Most general concepts in computing (the "bug", for example), far predate the invention of computers; anecdotal reports of their invention (Grace Hopper discovers short circuit caused by moth) are usually either apocryphal or epiphenomenal. Nevertheless, David Canfield Smith makes a strong claim (Smith 1996) to the invention of the "desktop" metaphor that has inspired all of the research described here. His pygmalion system (Smith 1977) was developed in the Stanford AI Lab, providing the basis of the Xerox Star (Smith et. al. 1982, Johnson et. al. 1989), and subsequently the Apple Macintosh. pygmalion's status as an AI project meant that it originally expressed a theory of cognition, and was never simply a software tool. It was not based on empirical studies of metaphor and analogy, however. Smith considered that pygmalion would finally allow computers to be used for creative tasks, because its graphical nature corresponded directly to the mental imagery that forms the basis of creative thought. He based his argument in psychological theories of aesthetics (Arnheim 1970, Koestler 1964) rather than problem-solving. The "metaphor" in pygmalion's graphical interface was there because "visual imagery is a productive metaphor for thought" (Smith 1977, p. 6). Smith's theory of creativity is seldom cited directly in the HCI literature, but it appears to have been influential in other areas of computer science, as will be seen in chapter 3.

Alternatives to Metaphor in HCI

This review has focused on the arguments that might be made for the value of diagrammatic metaphor in contexts such as HCI. It has not considered some fundamentally different approaches to the analysis of communication and representation. Blackwell and Engelhardt (1998) have made a more detailed study of the many different typologies that have been proposed for classifying and studying diagrammatic representations. Some of these are impressively detailed semiotic analyses of the potential space of graphical configurations (e.g. Twyman 1979). Alternative reviews have included cognitive historical analyses of the origins of graphical representations (Gregory 1970, chapters 8 and 9), naive classification of visual representations by experimental subjects (Lohse et. al. 1994) and classifications of the interaction between media types and sensory modalities (Stenning, Inder & Neilson 1995).

Many studies of HCI place it within a broader communicative context, in which the effectiveness of supposed metaphors can be criticised on social grounds (Bødker 1991, Nardi 1993, Nardi & Zarmer 1993), or in terms of the user's conversational interaction with the interface (Payne 1990, Strothotte & Strothotte 1997, Bottoni et. al. 1996). To use the word metaphor in a different sense, each of these analyses describes HCI in terms of some contextual metaphor: a conversation metaphor, a social interaction metaphor, or others. Reddy (1993) has analysed the implications of the "conduit" metaphor for communication between people, and shown how it influences communicative intent. There have been similar critiques of HCI. Laurel (1986) deplores the fact that most user interfaces insist that the user is manipulating a tool, when people do not want to manipulate tools: they want to play games or search databases. Hutchins (1989) has made a collection of the different metaphors that might be applied to HCI - not only using a tool, or holding a conversation, but making a declaration, acting in a model world or collaborating with an intermediary. The topic of this thesis addresses a metaphor that is far more central in HCI - the user-interface-metaphor metaphor. It relies on two assumptions: that graphical representations are metaphorical, and that metaphors are valuable as cognitive tools.


This thesis evaluates the benefits of diagrammatic metaphor as a cognitive tool. Diagrams share some structural properties of language, and many of these can be analysed to explain how their structural characteristics assist with certain types of reasoning task. They can also be interpreted pictorially, in which case interpretation is a metaphorical process. Metaphor is an important educational tool, but the claims made for the value of metaphor in graphical user interfaces are more contentious. This thesis aims to supplement the studies of structural characteristics of diagram use, some of which also provide sufficient explanation for the benefits of graphical user interface. The research described here explicitly manipulates the metaphorical content of diagrams, while leaving the structure unchanged.

These results are applicable to many classes of diagram, even though they address the specific claims made by the HCI community. The point at which those claims become most relevant to other diagrams is in the discussion of visual programming languages - complete and sophisticated diagrammatic tools with a clearly defined semantics that can be applied to a broad range of problem solving tasks. The instructional benefits of metaphor should be clearly apparent in this class of diagram.

Continue reading chapter 3 , or return to table of contents and download information .