Effective scientific electronic publishing

Markus G. Kuhn, Computer Laboratory, University of Cambridge

This is a brief list of recommendations for authors of scientific papers who make their work available online. It focuses in particular on producing high-quality PDF files with LaTeX and covers some other technical and typographic pitfalls.

Contents

Be consistent with how you write your name

Choose an exact spelling of your name at the start of your scientific career and use that and only that on all your publications. Do not change any part of your name. If you have a middle initial in your name, then either use it always (preferred) or use it never, but avoid switching between the two possibilities. Otherwise, you will get sorted in bibliographic databases (Science Citation Index, etc.) under various different places like J DOE and JA DOE, which makes it more difficult to locate your work.

Use the LaTeX styles suggested by the conference organizers

Make sure your online version has page numbers and reference information

Camera-ready submission formats required by publishers often lack page numbers or an indication of where this paper was published, because the publishers want to add this information themselves. If you put this camera-ready version on the Web, then people will print it out and forget where they downloaded it. If they then can’t find the reference information on the paper, they will not be able to quote your paper properly.

Therefore, your own online version should differ from the submitted camera-ready copy in these two aspects. The page numbers should be switched on and the precise bibliographic reference of your paper should be included. Preferably put the reference information at the bottom of the first page, in a way that does not change the page breaks compared to the submitted camera-ready copy.

Update your online copy once you receive all the precise metadata (page numbers, ISBN, publication date, etc.) of the final published paper version.

If you make a paper that you submitted to a publisher available online, then read the publisher’s copyright conditions carefully. Most scientific publishers now allow you to have your paper on your Web page, but some require you to add a special copyright notice.

The first printed page should be page number 1

International Standard ISO 7144 (“Documentation — Presentation of theses and similar documents”):

“The numbering of pages shall run consecutively, including blank pages, also if a thesis is published in several volumes, in arabic numerals, beginning on the recto of the first printed leaf. The title-leaves are counted but not numbered.”

If your document comes with a table of contents or index and the printed version is usually bound separately (e.g., a thesis, technical report, manual, book), then it is very convenient if the page numbers printed in the document match exactly the page numbers displayed by an electronic document viewer, such as Adobe Reader or ghostview. This is most easily achieved, if, starting from the title page (the front of the first page that comes out of the printer), all pages are numbered consecutively in Arabic numerals (1, 2, 3, ...). The LaTeX “article” and “report” styles do this. Avoid separate Roman numerals for front matter (the LaTeX “book” style uses these by default). Should the thesis presentation regulations of your institution disagree, you may want to make those who wrote them aware of ISO 7144.

Use PDF as the distribution format for your online version

Adobe’s Portable Document Format is today clearly the preferred format for publishing formatted documents. PDF has several advantages over the more traditional Adobe PostScript format:

PDF files can be created, for example, with the “ps2pdf” tool included with ghostscript, or with Adobe’s Acrobat Distiller, by converting PostScript files into PDF.

Ghostscript versions before 6.5 lacked full Type1 font support for PDF. All non-standard Type1 fonts were transformed into 720 dpi pixel fonts when writing a PDF file. Make sure you use a recent version.

Please do not package PDF files into ZIP files. They are already compressed. Put them directly on your web server. Applying PKZIP in addition will not reduce the size significantly, but it will render the convenient PDF plug-ins of web browsers useless. Make sure that your web server serves PDF files with the line “Content-Type: application/pdf” in the HTTP header.

PdfTeX is a version of TeX that can produce both DVI and PDF files as output. It knows a number of additional commands to control the PDF output (adding URLs, embedding graphics, etc.). Pdftex does not allow to embed EPS files as this is possible with dvips, but EPS files can be converted into PDF using ghostscript and the epstopdf script that comes with tetex. The usual way of including diagrams into TeX documents is to use xfig or for more complicated cases MetaPost in order to generate an embedded PostScript file. Both these tools allow you to include mathematical formulae into diagrams that will be typeset by LaTeX (in xfig, export to "Combined PS/LaTeX (both parts)" to get a pair of pstex/pstex_t files).

I’ll discuss below some of the more important issues of generating PDF with TeX.

Some related information can be found in:

Use Type1 vector fonts for generating PDF files with TeX

TeX (and LaTeX) traditionally used raster-graphic fonts produced by Metafont for a specific device resolution. Dvips originally produced PostScript files containing 300 or 600 dpi raster fonts, and so did the PDF files converted from that by ps2pdf or Acrobat Distiller. PDF viewers do usually a rather bad job when displaying device-dependent “Type3” raster fonts. Texts in raster fonts are displayed slow on the screen and with no or suboptimal anti-alias filtering. Also, the “Type3” raster fonts inserted by dvips lack information about which character each glyph represents, which interferes badly with full-text search and copy&paste.

You can check whether the output of dvips contains any Type3 raster/bitmap fonts under Unix with the command

  grep '%DVIPSBitmapFont:' file.ps
which should produce no output if there are no bitmap fonts. Instead of Type3 (raster graphics) fonts, make sure any Postscript file that you produce for conversion into PDF uses only resolution-independent Type1 vector fonts.

Fortunately, a consortium of AMS, SIAM, IBM, Springer, Elsevier, BlueSky Research, and Y&Y Inc. arranged to make commercial high-quality PostScript Type1 versions of both the Computer Modern fonts and AMS fonts for TeX freely available under the copyright of AMS.

Dvips has used these resolution-independent TeX fonts by default for a few years now. If you still use some pre-2005 version of dvips, you may have to use special command-line options such as

  dvips -Ppdf -G0 ...
to get the desired Type1 fonts. Or better upgrade to a more recent TeX distribution. [The -G0 was a workaround for an old bug in dvips that caused ligatures to disappear in some fonts, which also got fixed in more recent versions.]

Make sure you configure the distiller to the “Subset fonts below 100%” option. This will ensure that only fonts for which 100% of all characters are used in the document are included completely and the distiller will remove font data for all unused characters from your PDF file. This will keep your PDF files small.

When you want to convert to PDF historic PostScript files that were produced with Computer Modern bitmap fonts, then try the pkfix tool to replace these fonts in the PostScript with their Type1 equivalents.

Set the information fields of the PDF file

In PDF files, you can store the title, authors, and keywords of a paper in special information fields. This information can help search engines to locate and present your paper more accurately. There are several ways to set this information:

Use the paper format of the printed version in the PDF file

PDF files look best on the screen if the specified paper size matches the one for which the layout was designed. Therefore, use the actual physical paper size of the published document in the PDF file. When a PDF file is printed, the page will always be centered, and if the Shrink oversizes pages to paper size or Expand small pages to paper size function is used it is also guaranteed to fit the output paper size.

For users of the LNCS style: the paper size is 152 mm × 235 mm and correct alignment of the output relative to the upper left corner can be achieved by instructing distiller to ps2pdf to use the CropBox parameters [92 112 523 778].

British Standard BS 1413 defines a book page size of 156 mm × 234 mm called “Metric royal octavo”. This is the best clue I have found so far on where the LNCS format might have come from.

There are several ways to achive this:

Use appropriate graphics formats for figures

Choose appropriate formats for included graphics. In particular:

Whenever possible, use resolution-independent vector graphic formats (e.g., EPS, WMF) for diagrams and line drawings such as

Use pixel-based raster-graphic formats (PNG, JPEG, TIFF, GIF) only for figures in which the original source information has a fixed resolution, such as

Before considering lossy compression formats, such as JPEG and GIF, for presenting scientific data, make sure you understand exactly what information their encoders throw away.

In particular:

Warning: Normally, distiller and ps2pdf will apply the DCT-JPEG compression to any colour and grayscale raster image that they encounter in the input PostScript file. In many scientific publications, especially those related to image processing and compression, this JPEG compression can introduce unacceptable artifacts that distort the meaning of the image. You can avoid this by processing the output of pnmtops with my sed script nojpeg.sed, which adds a setdistillerparams command to the generated EPS file that deactivates JPEG compression in the distiller for this image only.

To use nojpeg.sed in the Makefile described in the next section, simply use the replacement macro

  PNMTOPS=pnmtops -rle -noturn -nosetpage | sed -f nojpeg.sed

Use good software engineering for the document sources

Ensure that your document preparation becomes a traceable and repeatable process, just as you should have learned to do with software (think ISO 9000).

Use filenames that are meaningful in a broader context

It is a good idea to include an indication of where the paper is published (abbreviation for the conference or journal) and a most significant title word in the filename. For instance ih98-tempest.tex is much more useful then just paper.tex. Plan your filenames such that you and all your local colleagues can have them nicely together in a single public directory. No filename should be longer than 25 characters; preferably keep them at less than 15 characters. Use only lowercase US-ASCII letters, digits, hyphens, and a dot (only for the extension).

Validate your HTML files

If you publish a HTML version of your paper, then please check not only whether it displays nicely with your current browser, but also send it through an SGML parser that grammatically validates your HTML syntax against the HTML 4.01 document type definition. A validation service is available for instance from W3C, or you can easily install your own using nsgmls. Also perform a link check from time to time, as URLs are unfortunately not very stable.

Typographic conventions

Professional typesetting works slightly differently from using typewriters or ASCII email. Make sure that you are well familiar with these conventions. Lamport’s LaTeX User’s Guide provides a very brief introduction is section 2.2.1. In particular, make sure you are aware of

When using BibTeX, understand that it tries to change the capitalization of titles to lowercase unless a word is protected by surrounding {}. Therefore, protect all proper nouns (names) and abbreviations in this way in your BibTeX file.

Here are some more typographic conventions that you may want to consider:


Special thanks to Robin Fairbairns and Lars Engebretsen for useful suggestions.

Further suggestions for this text are very welcome! Just mail me.

This work is licensed under a
Creative Commons Attribution
4.0 International License
.
Creative
Commons Licence

Markus Kuhn

created 1998-05-01 – last modified 2008-06-17 – http://www.cl.cam.ac.uk/~mgk25/publ-tips/