Computer Laboratory

Editing the Computer Laboratory main web site via Subversion

This is a brief guide on how to edit pages of the Computer Laboratory main website using the Subversion version control system. It is aimed at technically experienced users; for more detailed tutorials, try the Subversion beginners pages.

The infrastructure described here is primarily aimed at administrative staff who want to edit the main web site. It can also be set up for use by research groups to manage their own shared web space. While the mechanism described here is used on the main website with the ucampas page formatting tool, and has special support for that, it can also be used on its own, to coordinate management of arbitrary collections of files, including HTML files formatted with other tools.

How does it work?

We no longer give people direct write access to many of the subdirectories under /anfs/www/html, where our departmental web server finds the HTML pages that make up our main web site.

Instead, these web pages are kept in a “Subversion repository”, a special file system that provides database-like features (atomicity, logging, event triggers, conflict handling, etc.) and preserves old versions of the files that it stores.

The only way to edit these files is by using Subversion client software to talk to the Subversion server that guards and manages this repository. Whenever you commit a change to a web page back into the repository, this will not only update the repository itself, but it will also update the /anfs/www/html directory where the departmental web server will then find updated files with your changes.

This approach may seem a little bit more complicated then directly editing the HTML files involved, but it has several big advantages:

  • Every change will be recorded and monitored. Every change that you make will cause an email to be sent to pagemaster (or whoever else has editorial responsibility for a part of the web site). If they do not like your change, they will usually spot and fix or undo your changes quickly. This gives the contributor peace of mind when making the edits and it gives pagemaster peace of mind when granting wider write access to pages.
  • Every change can be undone easily. The Subversion never forgets old versions. This gives contributors peace of mind, because they cannot destroy any information. If a change created a mess, it will be easy to fix it.
  • It solves Windows/Linux incompatibility problems. The way Windows and Linux handle access control permission to files differs radically. Sending all changes via the Subversion server avoids that the same file is modified by both Linux and Windows users, which avoids a lot of practical problems with permission bits and access control lists.
  • It preserves a historic record. The repository will make it easy to recreate the web site in the future as it looked a few years ago, in case that ever becomes of interest.

How do I get access?

This is a quick summary of how to become a contributor to the web site.

  1. Create an ssh key pair, if you do not already have one. You will need it to authenticate yourself to the Subversion repository server.
  2. Contact pagemaster@cl and provide the following information:
    • Which parts of the web site would you like to be able to edit?
    • For which parts of the web site would you like to be notified about any updates (automatically emailed diffs)?
    • Which ssh public key(s) would you like to use to authenticate yourself to the Subversion server?
      (Pagemaster can simply retrieve these from your ~/.ssh directory on the filer, but please make sure they are readable, with:
      $ chmod a+rx ~/.ssh ; chmod a+r ~/.ssh/authorized_keys ~/.ssh/id_*.pub
      
      Alternatively, email your public key(s) in authorized_keys format.)
    • Do you require any help with the checkout of your own Subversion working directory on your computer? If so, which operating system do you use?

    Note for pagemaster: To enable a new public key accessing the repository:

    1. Add the crsid and public key(s) of the new contributor to /usr/groups/linux/extra-packages/svnserve/sshkey/www
    2. Call “cl-onserver --fixsvn && cl-onserver --fixsshcache” on ramsey (to update /usr/groups/wwwsvn/home/.ssh/authorized_keys accordingly).
    3. If the new contributor is interested in receiving email notifications of Subversion commits by others, then update the arguments of the commit-email.pl invocation in /usr/groups/wwwsvn/repositories/vh-cl/hooks/post-commit accordingly (via RCS).
  3. When pagemaster has confirmed that everything has been set up for you, you can check out your working directory. The root URL of the main web site is
      svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html
    
    and you have to decide whether you want to check out the complete site (a full working directory currently requires about 250 MB of disk space), or only a selected subdirectory.

A similar setup – but using separate Subversion repositories – is used for the web pages of some research groups, as well as the source files of the exam questions archive:

  svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/rainbow
  svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/security
  svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/tripos-papers/all

Quick example for Linux users

Using the checkout wizard

Simply type

  /anfs/www/tools/bin/cl-web-checkout

This interactive script will ask you to confirm

  • what svn+ssh URL you want to check out, and
  • where you would like your working directory.

It will then set up everything for you: svn checkout, creating the symbolic link needed in a partial checkout for ucampas to see the wider context of the site, and finally format everything with ucampas.

Alternatively:

Full checkout

If you would like to be able to work on all HTML parts of the website, then prepare your personal working copy under Linux using the commands

  mkdir ~/public_html/cl-preview/
  svn co svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html@ \
    ~/public_html/cl-preview/html
  ucampas -r ~/public_html/cl-preview/html

This will create a working directory inside your personal web space, where you can easily preview any changes that you make before committing them back.

Now edit some “*-b.html” file in your working directory. Then call ucampas to reformat it and preview the resulting file. When you are happy with your change, call “svn commit” to apply them to the repository.

Before you make an edit next time, remember to first call “svn update”, to make sure you work on the very latest version of the files.

Partial checkout

If, for example, you are only interested in the system-administration pages, it is sufficient to check out only

  svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html/local/sys

Under Linux you can do this with the commands

  mkdir ~/public_html/cl-preview/
  svn co svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html/local/sys@ \
    ~/public_html/cl-preview/sys
  ln -s /anfs/www/html/local/sys ~/public_html/cl-preview/sys/uorigin

The “uorigin” symbolic link created in the above example helps ucampas to find the navigation structure of the rest of the site in case you did not check out the whole thing (this works only on Lab administered machines).

Quick Subversion cheat sheet

svn updateupdate working directory to latest repository changes
svn commitcommit your changes back to the repository
svn diffwhich changes have you not yet committed back?
svn status -qwhich changes have you not yet committed back?
svn add -N fileadd a new file or directory to the repository (at next commit)
svn mv old_file new_fileuse instead of mv for files under version control
svn cp src_file dst_fileuse instead of cp for files under version control
svn rm fileuse instead of rm for files under version control
svn help commandmore cheat sheets

What files belong into the repository?

When you commit a change to some *-b.html file into the repository, the server will automatically update the /anfs/www/html working directory of the web server and then call ucampas there. So there is no need to keep any ucampas-generated *.html files in the repository. This way, ucampas-formatted pages can be updated even from computers where ucampas is not installed.

Files that belong into the repository include

  • *-b.html files
  • uconfig.txt files
  • small image files referenced by any of the above
  • small pdf or doc files referenced by any of the above (where version control seems appropriate, e.g. important departmental forms)

Caution should be exercised with adding huge files (e.g., binary software distributions, high-res image collections), in particular things that are unlikely to ever be edited in a collaborative way. We want to avoid repository bloat and hope to keep full working directories well below 300 MB in the long run. Placing large binary files directly into /anfs/www/html may sometimes be more sensible then going via the repository.

What about Windows?

Windows users can access Subversion repositories conveniently using TortoiseSVN, an easy to use extension to the Windows Explorer GUI shell.

Note that TortoiseSVN will need to access your ssh key via PuTTY. For this to work, save your PuTTYgen-generated private key into your StartUp folder, from where Pageant will load it automatically and then provide it to TortoiseSVN.

Ucampas is currently not yet available for Windows (except via a hack), so Windows users have to preview the undecorated *-b.html files locally and then check after the commit what ucampas made out of them on the main site.

Subversion is able to perform CRLF↔LF conversion if files have been marked appropriately, to take care of the end-of-line differences between Windows and Unix. In spite of this, it is still a good idea to avoid NotePad and WordPad under Windows and use instead a proper HTML editor that

  • is able to handle LF-terminated lines
  • uses the UTF-8 character set
  • does not prefix UTF-8 files with a byte-order-mark (BOM, U+FEFF)

Notepad++ is a suitable, good, free, HTML-aware plain-text editor for Windows.

Anything else to consider?

  • Avoid checking symbolic links into the repository. There are several reasons for this:
    • Symbolic links have no equivalent under Windows (Explorer “shortcuts” are always absolute; the symlinks that Vista added are only for administrators)
    • Symbolic links cause the same page to appear under two URLs, which irritates search-engine users.
    • End users will not see which of several URLs for the same page is not via a symlink and therefore will not understand, which URL is meant to be the official one.
    • Ucampas infers the URL from the absolute pathname of the file and generates relative URLs in its navigation information accordingly. Relative URLs can break if the page is accessed through an alias-URL via a symbolic link.

    Instead of using a symlink, add to the .htaccess file at the old location a “Redirect permanent old_path new_URL” entry, such as

      Redirect permanent /UoCCL/az.html http://www.cl.cam.ac.uk/az/
    

    This way, anyone accessing an old URL will still get instantly to the new page, but will see the new location in the browser address field and bookmark it accordingly.

  • Avoid filenames that differ only in case. Windows’ case-invariant filesystem cannot handle these. Preferably use in filenames only lower-case letters, digits, hyphens, underscore and dot. Other characters (including space) can lead to ugly URLs.
  • Tell Subversion to automatically assign end-of-line semantics or a MIME type to newly added files, based on their extension. This helps to avoid the CRLF problem under Windows.

    Under Linux, put the lines

      [miscellany]
      enable-auto-props = yes
      [auto-props]
      .htaccess = svn:eol-style=native
      *.html = svn:eol-style=native
      *.css = svn:eol-style=native
      *.txt = svn:eol-style=native
      *.pdf = svn:mime=application/pdf
      *.doc = svn:mime=application/msword
      *.gif = svn:mime=image/gif
      *.png = svn:mime=image/png
      *.jpg = svn:mime=image/jpg
    
    into ~/.subversion/config. This is no longer needed if you use Subversion 1.8 or newer, which now automatically receives equivalent configuration settings from the repository.
  • Use relative links to files in the same repository. This way, these links will work no matter where the repository is checked out.

    Use absolute URLs to anything outside the repository (e.g., “http://www.cl.cam.ac.uk/...” or “//www.cl.cam.ac.uk/...” or “/...”, the latter alternatives working better with https).

  • Use the UTF-8 encoding (without BOM) in plain-text files. The ucampas tool assumes that all its input and output files are in UTF-8. Under Linux, operate your editor and xterm under the locale setting LANG=en_GB.UTF-8, which is now the default anyway.

Dealing with binary files (PDF, etc.)

It is, in principle at least, preferable to keep in the Subversion repository only human-editable plain-text source files (e.g., HTML, LaTeX), and to exclude generated large binary files, such as PDFs, MS-Office documents, tarballs, ISO images, for the following reasons:

  • such files can easily become quite large and can therefore become a burden to users who want to check out a substantial part of the tree.

    Note: PDFs become large if they include fonts and bitmap images. PDFs without images that use only standard PostScript fonts such Helvetica or Times, without embedding them, can remain quite small.

  • many binary files are not the originally edited source file, but are automatically derived from another file. In other words, they are redundant and not useful to produce an updated version of the document.
  • The diff between two versions of a binary file is rarely useful.

We have several alternative ways to deal with PDFs that logically belong to parts of the web site whose HTML files are in the repository:

  1. Just add the PDF via Subversion. This can be acceptable for occasional small and non-growing collections of PDFs, as long as they do not become a substantial fraction of the size of a typical working directory. If there are associated simple LaTeX source files, they may go into the same directory. Example: expense claims form, formal notices about exams.
  2. Keep the LaTeX sources in the repository and cause make install to copy the resulting PDFs into the appropriate place under the server's working directory /anfs/www/html/. We do that at the moment for a few major departmental documents with complex source file structure, such as the Computer Science Tripos syllabus and a few other teaching documents that are maintained collaboratively. In some cases, the LaTeX sources live under
      svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/latex
    
    and are, therefore, not visible to the web server (example: Blue Book, Pink Book, Teaching Handbook). In other cases, the LaTeX sources live in the HTML tree (example: Diploma project examples). In both cases, make install can only be called by people for whom write access to the relevant part of the server's working directory has been arranged (via suitably chosen Unix group membership of the destination directory).
  3. Create a separate subdirectory for PDFs, which is not committed to the Subversion repository.

    There are some existing places for such directories:

    http://www.cl.cam.ac.uk/downloads/ – software packages, PDF archives, etc.
    http://www.cl.cam.ac.uk/manuals/ – local copies of equipment manuals, etc.

    Where the PDF directory is not under the servers working directory at /anfs/www/html, a symbolic link to it has to be created there. This is how we now handle most committee minutes and papers: /anfs/www/html/local/committees/*/minutes is a symbolic link to something like /usr/groups/deptadmin/Committees/*/public/minutes, that is near where the secretaries keep their corresponding Microsoft Office source files.

    Ucampas provides various (not yet fully documented) means for automatically inserting into a page a list of filenames, especially if the filename is an ISO 8601 date (YYYY-MM.pdf or YYYY-MM-DD.pdf), as is now common for committee minutes. These can be used to arrange that PDF files added to any directory are automatically linked from a Ucampas-formatted web page. (Ask Markus Kuhn for details.)

  4. Put PDFs into a separate repository tree
      svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/pdf
    
    that the web server accesses via suitably placed symbolic links in the HTML tree. This grants easy write access for Subversion users, without burdening anyone who just checks out the adjacent html subtree. (This option is not currently used at the Computer Laboratory, but Wolfson College used to handle committee minutes this way.)
  5. Avoid producing PDFs where HTML would also do. This is especially advisable where we rarely expect the document to be printed out and where good on-screen usability is more important than good paper typography. A good reason for using PDF is if the paper incarnation is more important than the online version (example: PhD thesis, paper form); a bad reason is if the originator of the document simply is more familiar with producing PDF than with producing HTML.

So there are many possible alternative ways in which PDFs and similar large binary objects can be added without cluttering HTML working directories with big files. To avoid user confusion, it may be a good idea to limit the number of different solutions actually used.