Apostrophe and acute accent confusion

Markus Kuhn

Summary: The unfortunate layout of some European PC keyboards (German, Dutch, etc.) combined with problematic keyboard driver semantics causes many users to enter instead of the apostrophe character the acute accent. This page details the problem and proposes sound and elegant solutions for both X11 and Win32 platforms.

First some background information on character sets

A coded character set is the table according to which a computer represents characters internally as binary numbers. ISO 8859-1 (also known as Latin-1) was the commonly used 8-bit coded character set for West European languages during the late 20-th century. Of the 189 characters that it covers, we are interested in the following three here:

0x27APOSTROPHE'
0x60GRAVE ACCENT`
0xB4ACUTE ACCENT´

The apostrophe character 0x27 represents the single undirectional (vertical) quotation mark that serves on the common simple typewriter also as the apostrophe. The acute and grave accent characters represent the accents used in languages such as French, but without any associated base character below. They were probably included in the standard because European typewriters feature corresponding key labels, but these accent-only characters have no common use in European text.

The ISO 8859-1 character set has since been extended into and largely replaced by Unicode, which is also known as Universal Character Set (UCS) or ISO 10646. It adds two directional single quotation marks:

U+2018LEFT SINGLE QUOTATION MARK‘
U+2019RIGHT SINGLE QUOTATION MARK’

In proper typography, U+2019 is not only the right single quotation mark, but also the preferred character for representing the apostrophe.

In older 7-bit ASCII fonts of US origin, the designers saw little use for a grave accent in 0x60 and shaped its glyph to be a compromise also usable as a left single quotation mark ("back quote"):

0x27’
0x60‛

For instance the X11 core fonts followed this habit until recently.

European keyboard layouts

Though the following description refers to the standard German PC keyboard (DIN 2137-2:1988), the exact same problem is also present in the Dutch keyboard, and to a lesser degree perhaps also in the Spanish, Latin American, Portuguese, Danish, Finnish, Swedish, and other keyboards.

The standard German PC keyboard has key labels for all of the three ISO 8859-1 characters mentioned above:

Photo showing part of German PC keyboard

We are interested here in two keys in particular:

[Note: The new German keyboard standard DIN 2137-2:1995 has reduced the number of keys in the alphanumeric section from 48 to 46. It removed the C12 key ("#" and "'") in favour of a larger return key and the B00 key ("<" and ">") in favour of a larger left Shift key. The apostrophe is in the new standard on E00 (left of the 1 key) and the number sign is on AltGr-N. The less-than and greater-than signs are now on AltGr-Y and AltGr-X. The circumflex is now on Alt-Gr-' and the vertical bar on AltGr-1. In DIN 2137-2, not only the acute and grave accent are non-spacing, but also the tilde and circumflex. Due to software compatibility concerns, the new DIN 2137-2:1995 has not yet been implemented commonly in PC compatible keyboards, which still follow the old DIN 2137-2:1988 layout. The new standard, which starts now to get commonly used on new electronic typewriters, would reduce the apostrophe problem as this character is now available in the new standard layout as easily as the acute accent without pressing Shift.]

The Microsoft Windows keyboard driver

Under Windows, the German C12 key with Shift enters the ISO 8859-1 apostrophe (0x27), as one would expect. The E12 key has been implemented as a non-spacing (dead) key. Pressing E12 alone enters no character, but modifies what happens when the next key is pressed. Pressing E12 (acute) followed by "e" leads to entry of "é" and pressing Shift+E12 (grave) followed by "e" leads to entry of "è". So far very nice and no problem.

The problem comes with inexperienced users who enter English text (with lots of apostrophes) on a German keyboard. The E12 label for the acute accent looks temptingly similar to a typographic apostrophe. In addition, the acute key can be reached without Shift, while the proper apostrophe requires that the Shift key is pressed. Ignorance and the tendency to follow the path of least resistance (don't use Shift) combine here and as a result numerous people type "it´s" instead of "it's". In some fonts, the difference is hardly noticeable on the screen, in others it looks horrible, but the printed output of "it´s" is a typographic disaster with most fonts.

The problem is that the Windows keyboard driver is a bit too liberal here. If some accent and base character combination is not available (s-acute in the above example), then it spits out two characters, namely the spacing accent followed by the base character ("´s"). Users therefore do not notice that they did something wrong, unless the on-screen font design and size make it very obvious that something other than an apostrophe was entered here.

Solution: The problem could be fixed very easily. The keyboard driver should be changed, such that if an accented character (as requested by pressing a non-spacing key followed by a base character) is not available, a beep should occur and no character must be generated. This way, the user is made aware that the wrong key was used. Using C12 now becomes the easiest way of entering an apostrophe. The spacing acute (0xb4) and grave (0x60) characters can still be entered easily by pressing the space bar after the non-spacing accent key E12. Just like on a typewriter. It might also help if font designers could verify that the spacing acute and grave accents (U+0060 and U+00B4) are clearly distinguishable in low-resolution fonts from an apostrophe (U+0027 and U+2019), for example by making the spacing accents rather flat and positioning them high.

The X11 keyboard driver

Under Unix and X11, the problem is essentially the same as described for the Windows driver, but things are a bit more complicated for historic reasons:

Both the "de(basic)" and "de(nodeadkeys)" mappings are rather unsatisfactory for Unix users:

Solution:

I would like to propose the following measures to solve this problem:

A) Reduce the need for having a spacing grave accent key:

With these few and simple measures implemented, there will remain no practical reason for having a spacing grave accent key on the keyboard. Certainly no reason good enough to justify preventing either the simply entry of French names or the compatibility with typewriters and Windows.

B) Unify in /usr/X11R6/lib/X11/xkb/symbols/de the two mappings "de(basic)" and "de(nodeadkeys)" into a new mapping "de(deadgraveacute)", by adding:

  partial alphanumeric_keys 
  xkb_symbols "deadgraveacute" {
      include "de(basic)"
      key <TLDE> {        [ asciicircum,  degree          ],
                          [     notsign                   ]       };
      key <AD12> {        [      plus,    asterisk        ],
                          [ asciitilde,   dead_macron     ]       };
      key <BKSL> {        [ numbersign,   apostrophe      ],
                          [ grave                         ]       };
  };

This mapping makes E12 a non-spacing key (like in "de(basic)"), leaves the ^ and ~ keys spacing (like in "de(nodeadkeys)"). This one single mapping serves the needs of users better than the previous choice of two different unsatisfactory mappings.

With this mapping, the spacing grave accent can still be entered very easily in two ways: Either Press E12 followed by the space bar, or press AltGr+C12.

As a transition compromise, it might also be desirable for some users to have a mapping "de(deadacute)" with:

  partial alphanumeric_keys 
  xkb_symbols "deadacute" {
      include "de(deadgraveacute)"
      key <AE12> {        [ dead_acute,   grave           ],
                          [ dead_cedilla, dead_ogonek     ]       };
      key <BKSL> {        [ numbersign,   apostrophe      ],
                          [ dead_grave                    ]       };
  };

This keeps the grave accent spacing for the time the measures under A) are not yet implemented (and so nothing changes for TeX and Emacs users), but it makes the acute accent non-spacing, to prevent its accidental use as an apostrophe.

References

Special thanks to Rainer Seitel for comments and references.

created 2001-05-04 – last modified 2001-05-07 – http://www.cl.cam.ac.uk/~mgk25/ucs/apostrophe.html