Unicode fonts and tools for X11

The classic X Window System bitmap fonts are now available in an ISO 10646-1/Unicode extension.

UTF-8 xterm screenshot using 6x13.bdf

We have extended all the "-misc-fixed-*" fonts:

  5x7     -Misc-Fixed-Medium-R-Normal--7-70-75-75-C-50-ISO10646-1
  5x8     -Misc-Fixed-Medium-R-Normal--8-80-75-75-C-50-ISO10646-1
  6x9     -Misc-Fixed-Medium-R-Normal--9-90-75-75-C-60-ISO10646-1
  6x10    -Misc-Fixed-Medium-R-Normal--10-100-75-75-C-60-ISO10646-1
  6x12    -Misc-Fixed-Medium-R-Semicondensed--12-110-75-75-C-60-ISO10646-1
  6x13    -Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
  6x13B   -Misc-Fixed-Bold-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1
  7x13    -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-70-ISO10646-1
  7x13B   -Misc-Fixed-Bold-R-Normal--13-120-75-75-C-70-ISO10646-1
  7x14    -Misc-Fixed-Medium-R-Normal--14-130-75-75-C-70-ISO10646-1
  7x14B   -Misc-Fixed-Bold-R-Normal--14-130-75-75-C-70-ISO10646-1
  8x13    -Misc-Fixed-Medium-R-Normal--13-120-75-75-C-80-ISO10646-1
  8x13B   -Misc-Fixed-Bold-R-Normal--13-120-75-75-C-80-ISO10646-1
  9x15    -Misc-Fixed-Medium-R-Normal--15-140-75-75-C-90-ISO10646-1
  9x15B   -Misc-Fixed-Bold-R-Normal--15-140-75-75-C-90-ISO10646-1
  10x20   -Misc-Fixed-Medium-R-Normal--20-200-75-75-C-100-ISO10646-1

Coverage

These fonts contain now all characters found in the following character sets:

The 6x13, 8x13, 9x15, 9x18, and 10x20 fonts cover a much larger repertoire in addition, that covers the comprehensive CEN MES-3A European Unicode 3.2 Subset, the International Phonetic Alphabet, Armenian, Georgian, Thai, Yiddish, all Latin, Greek, and Cyrillic characters, all mathematical symbols (including the entire TeX repertoire), APL, Braille, Runes, and much more. 9x15 and 10x20 also cover Ethiopian.

Newly added fonts

The following new "-misc-fixed-*" fonts were added:

  6x13O   -Misc-Fixed-Medium-O-SemiCondensed--13-120-75-75-C-60-ISO10646-1
  7x13O   -Misc-Fixed-Medium-O-Normal--13-120-75-75-C-70-ISO10646-1
  8x13O   -Misc-Fixed-Medium-O-Normal--13-120-75-75-C-80-ISO10646-1
  9x18    -Misc-Fixed-Medium-R-Normal--18-120-100-100-C-90-ISO10646-1
  9x18B   -Misc-Fixed-Bold-R-Normal--18-120-100-100-C-90-ISO10646-1
  12x13ja -Misc-Fixed-Medium-R-Normal-ja-13-120-75-75-C-120-ISO10646-1
  18x18ja -Misc-Fixed-Medium-R-Normal-ja-18-120-100-100-C-180-ISO10646-1
  18x18ko -Misc-Fixed-Medium-R-Normal-ko-18-120-100-100-C-180-ISO10646-1

6x13O, 7x13O and 8x13O are oblique/italic versions of 6x13, 7x13 and 8x13. 9x18 is an improved version of 9x15 that has more space above and below the base characters to increase readability and to allow overstriking combining characters to work properly. 18x18ja and 18x18ko provide Japanese and Korean doublewidth ideograms for 9x18. 12x13ja provides Japanese doublewidth ideograms for 6x13.

Adobe BDF fonts

I have also created revised ISO10646-1 versions of all the Adobe and B&H pixel fonts that come with X11R6.4. They contained about 30 additional Postscript characters (roughly the CP1252 repertoire) that were present in the old ISO8859-1 BDF files, but were not encoded and therefore not accessible for X clients. The revised ISO10646-1 versions contain not only these but also many more automatically generated accented Latin characters (e.g., all characters from ISO 8859 parts 1-4, 9-10, 13-15), and they also fix a few long-standing bugs with the old fonts (missing NBSP, exchanged multiplication/division sign, etc.).

Status

The fonts are now completed and implement at present version 3.2 of the Unicode standard (ISO 10646-1/Amd.1:2002). I will maintain them to fix bugs and to satisfy any newly reported user requirements. Note that the new fonts fix a problem with the Latin-1 quotation mark and accents.

Download

The fonts are freely available with installation instructions and example UTF-8 text files.

The "-misc-fixed-*" font package:
http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz
CJK ideographic wide character supplement (unpack into the same subdirectory as the above):
http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-asian.tar.gz
The Adobe and B&H font package:
http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts-75dpi100dpi.tar.gz

There is also a change log file for the "-misc-fixed-*" fonts.

Other character sets

The font packages include the ucs2any.pl Perl script, which converts ISO 10646-1 fonts into any other encoding for which there is a Unicode mapping table available. This way, you can quickly generate ISO 8859-* versions from the above fonts automatically, for the benefit of older software that cannot yet handle ISO 10646-1 fonts directly.

Distribution

I periodically contribute a recent snapshot of all of the above fonts to XFree86 and they have been shipping as part of the XFree86 releases since XFree86 4.1. I have also made them available to X.Org for inclusion into one of the next official X11 distributions as a replacement for the current ISO 8859-1 BDF fonts (hopefully they will be in X11R6.7). The copyright status of these fonts remains the same as for the original fonts in the X11 distribution, therefore any X11 server vendors are welcome to include them into their products without payment of royalties.

Related information and links

Other information relevant to Unicode font projects

Why are there no Indic or Syriac glyphs in the ucs-fonts package?

In European and East Asian scripts, each Unicode character can be represented by a single graphical shape ("glyph"). The X11 font system is entirely built around the idea that there is a one-to-one relationship between characters and glyphs, which works fine for Latin, Greek, Cyrillic, Hebrew, Han, Hiragana, Katakana, Hangul, etc. However, things are far more complicated for handwritten cursive scripts such as Arabic, Syriac and the various Indic scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, etc.). For these scripts, the sequence of values ("characters") encoded in a Unicode string (which usually corresponds to the sequence of keystrokes during entry and the sequence of phonemes when speaking) first has to be converted into a sequence of graphical symbols ("glyphs") as they are found in a font, before a string can be displayed. In a given Latin font style, always the same graphical glyph of a font will be used for representing a character on a screen. In an Arabic or Indic font, the shape of the glyph depends not only on the character that it represents, but also on its neighbour characters. Sometimes, different glyphs have to be used depending on the character appearing at the beginning, middle, or end of a word, and often certain entire sequences of characters have to be represented by a special ligature glyph. A very simple form of that is used in Latin fine typography in the form of the "fi" and "fl" ligatures, but in Indic scripts, the situation is far more extreme, and the number of glyphs is often several times the number of characters. For details and examples, read Chapter 9 and Chapter 8 as well as the relevant code charts of The Unicode Standard.

The Unicode standard does contain encoding ranges for a simple scheme of Arabic glyphs, the "Arabic Presentation Forms". This was possible, because for Arabic there is a reasonably good consensus among font designers on how many glyphs are actually necessary for proper rendering of Arabic text, even though some argue that for really high-quality typesetting the Unicode collection of Arabic presentation forms is not sufficient. For Indic scripts on the other hand, there seems no consensus among font designers, which glyphs are actually necessary as this can vary significantly across different font styles. Therefore, an Indic font is always a proprietary non-standardized collection of glyphs together with a mapping table that defines, how sequences of standard Unicode characters have to be transformed into sequences of non-standard Indic glyphs from this particular font, before the text can be displayed.

The OpenType font format developed by Microsoft and Adobe is an outline font format that does include such character/glyph mapping tables. The BDF format used by X11 pixel fonts does not have any standardized way of including a character/glyph mapping table, and neither do current BDF editors such as xmbdfed or X servers. The Pango rendering library developed for the Gnome project can make use of BDF glyph fonts, but it requires the corresponding character/glyph mapping table in a separate client-side file. The X11 standards currently provide no support for transmitting such mapping tables over the X11 protocol. Roman Czyborra’s GNU Unifont does contain a naive representation of the Indic glyphs shown in the Unicode Standard code charts, but that is of no use in practice for displaying Indic strings properly.

Summary: X11 was never designed for Arabic, Syriac, Indic, and special libraries such as Pango have to be used for these scripts. If you want to help getting Indic supported under X11, you have to extend the X11 standards to fix this problem and provide a font mechanism that understands that some scripts need to map characters into glyphs. The solution is unfortunately not as easy as just drawing a few glyphs with a font editor, otherwise we would already have added the Indic scripts long ago to the ucs-fonts package.

Markus Kuhn

created 1998-09-22 – last modified 2002-11-15 – http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html