Computer Laboratory Technical Report archive -------------------------------------------- Markus Kuhn -- http://www.cl.cam.ac.uk/~mgk25/ 1 Introduction This subdirectory is accessible on Computer Laboratory Linux machines under /anfs/www/html/techreports/ and it is world-readable via the URL http://www.cl.cam.ac.uk/techreports/ (formerly http://www.cl.cam.ac.uk/TechReports/) It contains all digitally available information related to the University of Cambridge Computer Laboratory's Technical Report series, both the documents as well as the tools and databases used to administer them. Some additional background information, perhaps of historic interest, written up during the establishment of the current TR regime, are available at http://www.cl.cam.ac.uk/~mgk25/techreports/ The scripts found in this subdirectory have been tested to run on Ubuntu Linux 18.04 or 20.04, with the departmental filer automounted via NFS, and with the following packages installed: $ apt-get install texlive-latex-base cl-texlive-extras \ pdftk-java poppler-utils # texlive-fonts-extra 2 Instructions for publishing a new technical report Technical reports are currently published by Markus Kuhn and Nicholas Cuttler. This document briefly summarises the established process, for the benefit of whoever takes over that job next. Instructions for authors who want to submit reports are at http://www.cl.cam.ac.uk/techreports/submission.html The publication of a technical report follows roughly the following steps: 1) Submitting author carefully reads http://www.cl.cam.ac.uk/techreports/submission.html and provides to tech-reports@cl the data requested there. 2) Edit the file tr-database.txt, to add a new last line for the new report, based on the information supplied by the author. This step involves assigning the technical report number , which is usually simply the last assigned number plus one. 3) Edit the file tr-abstracts.txt, to add the author-supplied text of the abstract. The syntax of both files is specified in header comments. Note that both tr-database.txt and tr-abstracts.txt are UTF-8 files. Non-typographic ASCII punctuation and TeX sequences such as ', `, ", and -- should *not* be used in these files. Use a UTF-8 editor (recent emacs, vim), a UTF-8 locale setting, a Unicode font, and .Xmodmap entries such as for example keycode 34 = bracketleft braceleft leftsinglequotemark leftdoublequotemark keycode 35 = bracketright braceright rightsinglequotemark rightdoublequotemark keysym m = m NoSymbol emdash mu keysym n = n NoSymbol endash NoSymbol to enter the proper Unicode quotation marks (‘’“”), en/em dashes (–—) and Greek letters (π). 4) Copy the author-supplied PostScript or PDF file to orig/.ps or orig/.pdf, respectively. (There is also a subdirectory orig-2 for files where the first two pages need to be removed before the new title page can be added. The Makefile will look into either and is meant to do the right thing automatically based on the subdirectory name and the extention of the files in there.) 5) PostScript files should be compressed: gzip -9 orig/.ps 6) make UCAM-CL-TR-.pdf This will produce the title page, concatenate it with the content of the orig/.* source file, and save the result as UCAM-CL-TR-.pdf. If the submission is a PostScript file, the above command feeds three things into 'gs': a prefix with setdistillerparams, a PostScript titlepage, and finally the submitted file, which are all distilled into one PDF using the pdfwrite driver. If the submission is a PDF file, the above uses by default 'pdftk' to concatenate the pages (TROPT=). Should this cause problems, there are several alternative techniques to try: a) make TROPT=gs UCAM-CL-TR-.pdf Use the 'gs' tool (ghostscript) to concatenate a PDF title page with the PDF submission. Processing a submitted PDF through ghostscript has several advantages and disadvantages: + Submitted PDFs may contain the same font multiple times, from included images. Ghostscript deduplicates fonts, leading to a smaller file size. + Submitted PDFs often contain the Computer Modern fonts in the old encrypted Type 1 format, which does not compress. Ghostscript converts these into the upwards-compatible CFF (Type 1C) font format, which compresses much better. - Ghostscript uses internally a vector-graphics model that lacks some of the facilities of PDF. It may therefore replace some PDF graphics primitives with others, e.g. a PDF rectangle may become a longer path description. Ghostscript's pdfwrite driver was not meant to modify an existing PDF: it instead creates a new PDF from scratch, trying to keep the resulting output PDF visually indistinguishable from the input PDF. https://www.ghostscript.com/doc/9.20/VectorDevices.htm [It appears pdftk-java (as of Ubuntu 20.04) now also deduplicates fonts, and converts Type 1 to Type 1C.] b) make TROPT=gsps UCAM-CL-TR-.pdf Like TROPT=gs above, but instead concatenate a PostScript title page with setdistillerparams with the PDF submission, i.e. use essentially the same processing pipeline as is used for PostScript submissions. c) make TROPT=acrobat UCAM-CL-TR-.pdf This prepares a PDF title-page and then prints instructions on how you then need to manually append the submitted PDF using Adobe Acrobat (under Windows). Make sure that you load the title page and insert the submitted PDF document after the last page (and not vice versa), such that the pdfinfo metadata from the title page survives. Acrobat can sometimes achieve smaller file sizes, by merging fonts that were included multiple times via figures. The results of these alternative techniques can lead to significant differences in the size of the resulting PDF file. If you used any TROPT option (other than the default) to produce the file, please record this target-specific variable in the Makefile, by adding a line such as UCAM-CL-TR-.pdf: TROPT=gs 7) Inspect the newly generated UCAM-CL-TR-.pdf with Adobe Reader. In particular look for - no use of pixel fonts - correct A4 page size (297x210 mm) - correct page numbers - page with abstract is page 3 - suitable margins - unreasonable file size - Adobe Reader error messages If during the inspection you find flaws, uncomment the relevant line in tr-database.txt (to avoid the draft becoming visible at the next "make") and contact author. 8) make This will generate a file UCAM-CL-TR-.html for all reports for which there isn't one yet, or for which the tr-database.txt entry is newer (tracked via timestamps and MD5 checksums in tr-database-times.txt). It will also regenerate all the different index and catalogue files that we publish. 9) Have a look at the generated UCAM-CL-TR-.html and the updated index page. 10) Register the Digital Object Identifier with “./tr-doi publish” and test that the redirect in the DOI shown on the abstracts page works. (Redo this after any changes to the metadata or abstract, to keep https://search.datacite.org/works/10.48456/tr-... up to date.) 11) Reply to the author with the message that "make UCAM-CL-TR-.pdf" produced at the end. (The command "make reply" will output the same message.) Long-term maintenance of the archive: - The entire /anfs/www/html/techreports/ archive (both the PDFs and the original PostScript files) should be put onto a CD-R or DVD-R about once every year, and deposited alternatingly in the library or at the home of some Computer Lab member. The "make cdrom" command prepares the necessary ISO image. - Monitor what new search engines and metadata standards are out there and consider adding support for them. Check how visible our TRs are in these search engines. - Update author's instructions based on past misunderstandings, user feedback and changes in technology. Some tips and tricks: - If the submitted document was formatted for US Letter (216 mm × 279 mm) and the author is unable to change the textheight and offsets in the source documents to something more suitable for A4 (210 mm × 297 mm), then shifting the text horizontally by 3 mm and vertically by 9 mm using pstops '0(-0.3cm,0.9cm)' may help to restore symmetric margins on A4. References: - Johan van der Knijff: PDF processing and analysis with open-source tools https://www.bitsgalore.org/2021/09/06/pdf-processing-and-analysis-with-open-source-tools