Computer Laboratory Technical Report archive
--------------------------------------------

Markus Kuhn -- http://www.cl.cam.ac.uk/~mgk25/


1  Introduction

This subdirectory is accessible on Computer Laboratory Linux machines
under

  /anfs/www/html/techreports/

and it is world-readable via the URL

  http://www.cl.cam.ac.uk/techreports/
  (formerly http://www.cl.cam.ac.uk/TechReports/)

It contains all digitally available information related to the
University of Cambridge Computer Laboratory's Technical Report series,
both the documents as well as the tools and databases used to
administer them.

Some additional background information, perhaps of historic interest,
written up during the establishment of the current TR regime, are
available at

  http://www.cl.cam.ac.uk/~mgk25/techreports/

The scripts found in this subdirectory have been tested to run on
Ubuntu Linux 18.04 or 20.04, with the departmental filer automounted
via NFS, and with the following packages installed:

$ apt-get install texlive-latex-base cl-texlive-extras \
  pdftk-java poppler-utils # texlive-fonts-extra


2  Instructions for publishing a new technical report

Technical reports are currently published by Markus Kuhn and Nicholas Cuttler.
This document briefly summarises the established process, for the
benefit of whoever takes over that job next. Instructions for authors
who want to submit reports are at

  http://www.cl.cam.ac.uk/techreports/submission.html

The publication of a technical report follows roughly the following
steps:

1) Submitting author carefully reads

     http://www.cl.cam.ac.uk/techreports/submission.html

   and provides to tech-reports@cl the data requested there.

2) Edit the file tr-database.txt, to add a new last line for the
   new report, based on the information supplied by the author.
   This step involves assigning the technical report number <nnn>,
   which is usually simply the last assigned number plus one.

3) Edit the file tr-abstracts.txt, to add the author-supplied
   text of the abstract.

   The syntax of both files is specified in header comments.

   Note that both tr-database.txt and tr-abstracts.txt are UTF-8
   files. Non-typographic ASCII punctuation and TeX sequences such
   as ', `, ", and -- should *not* be used in these files. Use
   a UTF-8 editor (recent emacs, vim), a UTF-8 locale setting,
   a Unicode font, and .Xmodmap entries such as for example

     keycode 34 = bracketleft braceleft leftsinglequotemark leftdoublequotemark
     keycode 35 = bracketright braceright rightsinglequotemark rightdoublequotemark
     keysym m = m NoSymbol emdash     mu
     keysym n = n NoSymbol endash     NoSymbol

   to enter the proper Unicode quotation marks (‘’“”), en/em dashes (–—)
   and Greek letters (π).

4) Copy the author-supplied PostScript or PDF file to orig/<nnn>.ps
   or orig/<nnn>.pdf, respectively.

   (There is also a subdirectory orig-2 for files where the first
   two pages need to be removed before the new title page can be
   added. The Makefile will look into either and is meant to do
   the right thing automatically based on the subdirectory name and
   the extention of the files in there.)

5) PostScript files should be compressed: gzip -9 orig/<nnn>.ps

6) make UCAM-CL-TR-<nnn>.pdf

   This will produce the title page, concatenate it with the
   content of the orig/<nnn>.* source file, and save the result as
   UCAM-CL-TR-<nnn>.pdf.

   If the submission is a PostScript file, the above command feeds
   three things into 'gs': a prefix with setdistillerparams, a
   PostScript titlepage, and finally the submitted file, which are all
   distilled into one PDF using the pdfwrite driver.

   If the submission is a PDF file, the above uses by default 'pdftk'
   to concatenate the pages (TROPT=). Should this cause problems,
   there are several alternative techniques to try:

   a) make TROPT=gs UCAM-CL-TR-<nnn>.pdf

      Use the 'gs' tool (ghostscript) to concatenate a PDF title page
      with the PDF submission.

      Processing a submitted PDF through ghostscript has several
      advantages and disadvantages:

        + Submitted PDFs may contain the same font multiple times,
          from included images. Ghostscript deduplicates fonts,
          leading to a smaller file size.

        + Submitted PDFs often contain the Computer Modern fonts in
          the old encrypted Type 1 format, which does not compress.
          Ghostscript converts these into the upwards-compatible CFF
          (Type 1C) font format, which compresses much better.

        - Ghostscript uses internally a vector-graphics model that
          lacks some of the facilities of PDF. It may therefore
          replace some PDF graphics primitives with others, e.g. a PDF
          rectangle may become a longer path description.
          Ghostscript's pdfwrite driver was not meant to modify an
          existing PDF: it instead creates a new PDF from scratch,
          trying to keep the resulting output PDF visually
          indistinguishable from the input PDF.

          https://www.ghostscript.com/doc/9.20/VectorDevices.htm

      [It appears pdftk-java (as of Ubuntu 20.04) now also
      deduplicates fonts, and converts Type 1 to Type 1C.]

   b) make TROPT=gsps UCAM-CL-TR-<nnn>.pdf

      Like TROPT=gs above, but instead concatenate a PostScript title
      page with setdistillerparams with the PDF submission, i.e. use
      essentially the same processing pipeline as is used for
      PostScript submissions.

   c) make TROPT=acrobat UCAM-CL-TR-<nnn>.pdf

      This prepares a PDF title-page and then prints instructions on
      how you then need to manually append the submitted PDF using
      Adobe Acrobat (under Windows). Make sure that you load the title
      page and insert the submitted PDF document after the last page
      (and not vice versa), such that the pdfinfo metadata from the
      title page survives. Acrobat can sometimes achieve smaller file
      sizes, by merging fonts that were included multiple times via
      figures.

   The results of these alternative techniques can lead to
   significant differences in the size of the resulting PDF file.

   If you used any TROPT option (other than the default) to produce
   the file, please record this target-specific variable in the
   Makefile, by adding a line such as

      UCAM-CL-TR-<nnn>.pdf: TROPT=gs

7) Inspect the newly generated UCAM-CL-TR-<nnn>.pdf with Adobe Reader.
   In particular look for

     - no use of pixel fonts
     - correct A4 page size (297x210 mm)
     - correct page numbers
     - page with abstract is page 3
     - suitable margins
     - unreasonable file size
     - Adobe Reader error messages

   If during the inspection you find flaws, uncomment the relevant
   line in tr-database.txt (to avoid the draft becoming visible at the
   next "make") and contact author.

8) make

   This will generate a file UCAM-CL-TR-<nnn>.html for all reports
   for which there isn't one yet, or for which the tr-database.txt
   entry is newer (tracked via timestamps and MD5 checksums in
   tr-database-times.txt). It will also regenerate all the different
   index and catalogue files that we publish.

9) Have a look at the generated UCAM-CL-TR-<nnn>.html and the
   updated index page.

10) Register the Digital Object Identifier with “./tr-doi publish<nnn>”
    and test that the redirect in the DOI shown on the abstracts page works.

    (Redo this after any changes to the metadata or abstract, to keep
    https://search.datacite.org/works/10.48456/tr-... up to date.)

11) Reply to the author with the message that "make UCAM-CL-TR-<nnn>.pdf"
   produced at the end. (The command "make reply<nnn>" will output
   the same message.)


Long-term maintenance of the archive:

  - The entire /anfs/www/html/techreports/ archive (both the PDFs and
    the original PostScript files) should be put onto a CD-R or DVD-R
    about once every year, and deposited alternatingly in the library
    or at the home of some Computer Lab member. The "make cdrom"
    command prepares the necessary ISO image.

  - Monitor what new search engines and metadata standards are
    out there and consider adding support for them. Check how
    visible our TRs are in these search engines.

  - Update author's instructions based on past misunderstandings, user
    feedback and changes in technology.


Some tips and tricks:

  - If the submitted document was formatted for US Letter (216 mm ×
    279 mm) and the author is unable to change the textheight and
    offsets in the source documents to something more suitable for A4
    (210 mm × 297 mm), then shifting the text horizontally by 3 mm and
    vertically by 9 mm using

      pstops '0(-0.3cm,0.9cm)'

    may help to restore symmetric margins on A4.

References:

- Johan van der Knijff: PDF processing and analysis with open-source tools
  https://www.bitsgalore.org/2021/09/06/pdf-processing-and-analysis-with-open-source-tools