Apostrophe and acute accent confusion
Markus Kuhn
Summary: The unfortunate layout of some European PC
keyboards (German, Dutch, etc.) combined with problematic keyboard
driver semantics causes many users to enter instead of the apostrophe
character the acute accent. This page details the problem and proposes
sound and elegant solutions for both X11 and Win32 platforms.
First some background information on character sets
A coded character set is the table according to which a computer
represents characters internally as binary
numbers. ISO
8859-1 (also known as Latin-1) was the commonly used 8-bit coded
character set for West European languages during the late 20-th
century. Of the 189 characters that it covers, we are interested in
the following three here:
0x27 | APOSTROPHE |
|
0x60 | GRAVE ACCENT |
|
0xB4 | ACUTE ACCENT |
|
The apostrophe character 0x27 represents the single undirectional
(vertical) quotation mark that serves on the common simple typewriter
also as the apostrophe. The acute and grave accent characters
represent the accents used in languages such as French, but without
any associated base character below. They were probably included in
the standard because European typewriters feature corresponding key
labels, but these accent-only characters have no common use in
European text.
The ISO 8859-1 character set has since been extended into and
largely replaced by Unicode, which is also known as Universal
Character Set (UCS) or ISO 10646. It adds two directional single
quotation marks:
U+2018 | LEFT SINGLE QUOTATION MARK |
|
U+2019 | RIGHT SINGLE QUOTATION MARK |
|
In proper typography, U+2019 is not only the right single quotation
mark, but also the preferred character for representing the apostrophe.
In older 7-bit ASCII fonts of US origin, the designers saw little
use for a grave accent in 0x60 and shaped its glyph to be a compromise
also usable as a left single quotation mark ("back quote"):
0x27 |
|
0x60 |
|
For instance the X11 core fonts followed this habit until recently.
European keyboard layouts
Though the following description refers to the standard German PC
keyboard (DIN 2137-2:1988), the exact same problem is also present in
the Dutch keyboard, and to a lesser degree perhaps also in the
Spanish, Latin American, Portuguese, Danish, Finnish, Swedish, and
other keyboards.
The standard German PC keyboard has key labels for all of the three
ISO 8859-1 characters mentioned above:
We are interested here in two keys in particular:
- The key left of the backspace key is labeled with acute and grave
accent (the "E12" key in ISO 9995 coordinates). This key is on German typewriters a
non-spacing key (DIN 2137). It does not advance the cursor but causes
the next character to appear below the accent, which is highly useful
for entering French words on German keyboards. [A circumflex accent
(^) can be entered on some typewriters by pressing the E12 key both
with and without Shift before the base character.]
- The key above Shift is labeled with # and ' (the "C12" key). This
is the key used to get the undirectional single quotation mark on
typewriters, which also serves as the apostrophe. Apostrophes occur in
German text only rarely, at least an order of magnitude less frequent
than in English text.
[Note: The new German keyboard standard DIN 2137-2:1995 has reduced
the number of keys in the alphanumeric section from 48 to 46. It
removed the C12 key ("#" and "'") in favour of a larger return key and
the B00 key ("<" and ">") in favour of a larger left Shift key.
The apostrophe is in the new standard on E00 (left of the 1 key) and
the number sign is on AltGr-N. The less-than and greater-than signs
are now on AltGr-Y and AltGr-X. The circumflex is now on Alt-Gr-' and
the vertical bar on AltGr-1. In DIN 2137-2, not only the acute and
grave accent are non-spacing, but also the tilde and circumflex. Due
to software compatibility concerns, the new DIN 2137-2:1995 has not
yet been implemented commonly in PC compatible keyboards, which still
follow the old DIN 2137-2:1988 layout. The new standard, which starts
now to get commonly used on new electronic typewriters, would reduce
the apostrophe problem as this character is now available in the new
standard layout as easily as the acute accent without pressing Shift.]
The Microsoft Windows keyboard driver
Under Windows, the German C12 key with Shift enters the ISO 8859-1
apostrophe (0x27), as one would expect. The E12 key has been
implemented as a non-spacing (dead) key. Pressing E12 alone enters no
character, but modifies what happens when the next key is pressed.
Pressing E12 (acute) followed by "e" leads to entry of "é" and
pressing Shift+E12 (grave) followed by "e" leads to entry of
"è". So far very nice and no problem.
The problem comes with inexperienced users who enter English text
(with lots of apostrophes) on a German keyboard. The E12 label for the
acute accent looks temptingly similar to a typographic apostrophe. In
addition, the acute key can be reached without Shift, while the proper
apostrophe requires that the Shift key is pressed. Ignorance and the
tendency to follow the path of least resistance (don't use Shift)
combine here and as a result numerous people type "it´s" instead
of "it's". In some fonts, the difference is hardly noticeable on the
screen, in others it looks horrible, but the printed output of
"it´s" is a typographic disaster with most fonts.
The problem is that the Windows keyboard driver is a bit too
liberal here. If some accent and base character combination is not
available (s-acute in the above example), then it spits out two
characters, namely the spacing accent followed by the base character
("´s"). Users therefore do not notice that they did something
wrong, unless the on-screen font design and size make it very obvious
that something other than an apostrophe was entered here.
Solution: The problem could be fixed very easily. The
keyboard driver should be changed, such that if an accented character
(as requested by pressing a non-spacing key followed by a base
character) is not available, a beep should occur and no
character must be generated. This way, the user is made aware that the
wrong key was used. Using C12 now becomes the easiest way of entering
an apostrophe. The spacing acute (0xb4) and grave (0x60) characters
can still be entered easily by pressing the space bar after the
non-spacing accent key E12. Just like on a typewriter. It might also
help if font designers could verify that the spacing acute and grave
accents (U+0060 and U+00B4) are clearly distinguishable in
low-resolution fonts from an apostrophe (U+0027 and U+2019), for
example by making the spacing accents rather flat and positioning them
high.
The X11 keyboard driver
Under Unix and X11, the problem is essentially the same as
described for the Windows driver, but things are a bit more
complicated for historic reasons:
- Before ISO 8859-1 came around, there was no use for the acute
accent key on German keyboards, therefore it was simply mapped to the
apostrophe in keyboard drivers and people got used to it.
- Some keyboard drivers later placed the new acute accent on
Shift+C12, which was so far a redundant position (because E12 gave
already the apostrophe). This way, the acute accent and apostrophe got
mixed up on some keyboard mappings, and the fact that old US fonts
didn't match the correct glyph shapes used on keyboards didn't help
either.
- More recent keyboard drivers placed the spacing acute accent on
E12 and the apostrophe on Shift+C12.
- Some US developers made use of the spacing grave accent (which
they called "backquote") in the syntax of various command languages.
The few instances where this happened are
- Both TeX and troff use the grave accent and apostrophe in their
formatting languages to denote the left and right single quotation
mark.
- GNU Emacs comes with the default key bindings C-x ` for
next-error and ESC ` for tmm-menubar.
- The m4 macro processor uses the grave accent and apostrophe to
delimit strings.
- Bash etc. allow `...` as an alternative for
$(...).
- Perl allows `...` as an alternative for
qx/.../.
- X11 comes at the moment with two German keyboard mappings in
/usr/X11R6/lib/X11/xkb/symbols/de. In the "de(basic)"
mapping, the keys for acute (E12), grave (Shift+E12), tilde
(AltGr+D12), and circumflex (E0) are all non-spacing. In the variant
"de(nodeadkeys)", all non-spacing keys are replaced by their spacing
equivalents.
Both the "de(basic)" and "de(nodeadkeys)" mappings are rather
unsatisfactory for Unix users:
- In "de(basic)", the tilde and circumflex are only available as
non-spacing keys. However, the spacing versions of these accents are
today very widely used ASCII characters, for instance as C or Perl
operators (bitwise not, bitwise xor), in sh or URLs to denote home
directories, in TeX (no-break space, superscript), etc. [Tilde "~" and
circumflex "^" were originally not present on the standard German
typewriter keyboard and where only added with level 3 shift (AltGr)
for ASCII compatibility later, just like "[]{}@\|".]
- In "de(nodeadkeys)", the E12 key is not a non-spacing key. This has
the following problems:
- Entry of French words and names (like André) is not obvious
any more, as it was on typewriters.
- The spacing acute is not only completely useless, but also
dangerous as it is often accidentally entered instead of the
apostrophe.
- In addition, the spacing acute has been replaced in ISO 8859-15
(the ISO 8859-1 successor with support for Finnish and the euro sign)
with the letter Z-with-caron.
- The spacing grave accent is rather rarely used only (Emacs, TeX,
m4).
Solution:
I would like to propose the following measures to solve this
problem:
A) Reduce the need for having a spacing grave accent key:
- The default key bindings of Emacs for next-error and
tmm-menubar should be changed to a sequence that does not
include the grave accent.
- The TeX mode of Emacs has already a smart-quote algorithm that
turns " into either `` or '' depending on the surrounding
whitespace. The same algorithm should also be applied to the
apostrophe, such that ' is turned into ` at the start of a word. This
frees TeX users from the need of a spacing grave key.
- The m4 mode of Emacs should automatically turn an apostrophe into a
grave accent if the context suggests that this is the beginning of a
string.
- Documentation should encourage users to write $(...)
in bash and qx/.../ in Perl instead of
`...`.
With these few and simple measures implemented, there will remain
no practical reason for having a spacing grave accent key on the
keyboard. Certainly no reason good enough to justify preventing either
the simply entry of French names or the compatibility with typewriters
and Windows.
B) Unify in /usr/X11R6/lib/X11/xkb/symbols/de
the two mappings "de(basic)" and "de(nodeadkeys)" into a new mapping
"de(deadgraveacute)", by adding:
partial alphanumeric_keys
xkb_symbols "deadgraveacute" {
include "de(basic)"
key <TLDE> { [ asciicircum, degree ],
[ notsign ] };
key <AD12> { [ plus, asterisk ],
[ asciitilde, dead_macron ] };
key <BKSL> { [ numbersign, apostrophe ],
[ grave ] };
};
This mapping makes E12 a non-spacing key (like in "de(basic)"),
leaves the ^ and ~ keys spacing (like in "de(nodeadkeys)"). This one
single mapping serves the needs of users better than the previous
choice of two different unsatisfactory mappings.
With this mapping, the spacing grave accent can still be entered
very easily in two ways: Either Press E12 followed by the space bar,
or press AltGr+C12.
As a transition compromise, it might also be desirable for some
users to have a mapping "de(deadacute)" with:
partial alphanumeric_keys
xkb_symbols "deadacute" {
include "de(deadgraveacute)"
key <AE12> { [ dead_acute, grave ],
[ dead_cedilla, dead_ogonek ] };
key <BKSL> { [ numbersign, apostrophe ],
[ dead_grave ] };
};
This keeps the grave accent spacing for the time the measures under
A) are not yet implemented (and so nothing changes for TeX and Emacs
users), but it makes the acute accent non-spacing, to prevent its
accidental use as an apostrophe.
References
- DIN 2137-2, Büro- und Datentechnik — Tastaturen — Teil 2:
Deutsche Tastatur für die Daten- und Textverarbeitung, Tastenanordnung
und Belegung mit Schriftzeichen. Deutsches Institut für Normung,
Berlin, October 1988 (old layout, PC keyboard) and July 1995 (new
layout).
- ISO/IEC 9995-1, Information technology — Keyboard layouts for
text and office systems — Part 1: General principles governing
keyboard layouts. International Organization for Standardization,
Geneva, 1994.
- ISO/IEC 9995-2, Information technology — Keyboard layouts for
text and office systems — Part 2: Alphanumeric section. International
Organization for Standardization, Geneva, 1994.
- ISO/IEC 9995-3, Information technology — Keyboard layouts for
text and office systems — Part 3: Complementary layouts of the
alphanumeric zone of the alphanumeric section. International
Organization for Standardization, Geneva, 1994.
- Keyboard layout collections from Microsoft, Macromedia, Mark Leisher, University
of Sussex Computing Service.
- ISO/IEC 10646-1, Information technology — Universal
Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and
Basic Multilingual Plane, International Organization for
Standardization, Geneva, 2000.
- Markus Kuhn: ASCII and Unicode quotation
marks. This page explains why the ASCII apostrophe and grave
accent should not be used as directional quotation marks, making clear
that ISO 10646 is required for typographically correct encoding
English text.
- Michael Everson: On the apostrophe
and quotation mark, with a note on Egyptian transliteration
characters, Working Group Document ISO/IEC JTC1/SC2/WG2 N2043,
1999-07-24
- Rote Liste bedrohter Arten: Der Apostroph is a German page about
the apostrophe/acute mixup.
Special thanks to Rainer Seitel for comments and references.