skip to primary navigationskip to content

Department of Computer Science and Technology

Web servers and sites

 

Editing the main web site

Editing the Computer Laboratory main web site via Subversion

This is a brief guide on how to edit pages of the Computer Laboratory main website using the Subversion version control system. It is aimed at technically experienced users; for more detailed tutorials, try the Subversion beginners pages.

The infrastructure described here is primarily aimed at administrative staff who want to edit the main web site. It can also be set up for use by research groups to manage their own shared web space. While the mechanism described here is used on the main website with the ucampas page formatting tool, and has special support for that, it can also be used on its own, to coordinate management of arbitrary collections of files, including HTML files formatted with other tools.

How does it work?

We no longer give people direct write access to many of the subdirectories under /anfs/www/html, where our departmental web server finds the HTML pages that make up our main web site.

Instead, these web pages are kept in a “Subversion repository”, a special file system that provides database-like features (atomicity, logging, event triggers, conflict handling, etc.) and preserves old versions of the files that it stores.

The only way to edit these files is by using Subversion client software to talk to the Subversion server that guards and manages this repository. Whenever you commit a change to a web page back into the repository, this will not only update the repository itself, but it will also update the /anfs/www/html directory where the departmental web server will then find updated files with your changes.

This approach may seem a little bit more complicated then directly editing the HTML files involved, but it has several big advantages:

  • Every change will be recorded and monitored. Every change that you make will cause an email to be sent to pagemaster (or whoever else has editorial responsibility for a part of the web site). If they do not like your change, they will usually spot and fix or undo your changes quickly. This gives the contributor peace of mind when making the edits and it gives pagemaster peace of mind when granting wider write access to pages.
  • Every change can be undone easily. The Subversion never forgets old versions. This gives contributors peace of mind, because they cannot destroy any information. If a change created a mess, it will be easy to fix it.
  • It solves Windows/Linux incompatibility problems. The way Windows and Linux handle access control permission to files differs radically. Sending all changes via the Subversion server avoids that the same file is modified by both Linux and Windows users, which avoids a lot of practical problems with permission bits and access control lists.
  • It preserves a historic record. The repository will make it easy to recreate the web site in the future as it looked a few years ago, in case that ever becomes of interest.

How do I get access?

This is a quick summary of how to become a contributor to the filer-hosted departmental web site.

  1. Create an SSH key pair, if you do not already have one on your computer. You will need it to authenticate yourself to the Subversion repository server.
  2. Go to our SSH key management page and upload there the SSH public key(s) that you would like to use to edit the website. These are usually found in your home directory in .ssh/id_*.pub or .ssh/authorized_keys. You can upload several public keys, one per line, e.g. if you want access from several devices. It may take up to an hour until updates to your keys become active.
  3. Contact pagemaster@cl (refering to this page) to let us know:
    • Did you already upload your SSH public key?
    • Which parts of the web site would you like to edit?
    • For which parts of the web site would you like to be notified about any updates (automatically emailed diffs)?
    • Do you require any help with the checkout of your own Subversion working directory on your computer? If so, which operating system are you using?

    Note for pagemaster:

    1. The SSH public-key upload GCI script stores the public keys in /anfs/www-uploads/sshkeys/wwwsvn/, in a separate file in authorized_keys format for each user.
    2. A cron job on mgk25@ely calls /anfs/www-uploads/sshkeys/Makefile to transfer these keys to /usr/groups/linux/extra-packages/svnserve/sshkey/www
    3. That also calls “cl-onserver --fixsvn && cl-onserver --fixsshcache” (to update /usr/groups/wwwsvn/home/.ssh/authorized_keys and /var/spool/ssh/wwwsvn accordingly).
    4. If the new contributor is interested in receiving email notifications of Subversion commits by others, then update the arguments of the commit-email.pl invocation in /usr/groups/wwwsvn/repositories/vh-cl/hooks/post-commit accordingly (via RCS).
  4. Once your keys have been installed, you can check out your working directory. The root URL of the main web site is
    svn+ssh:[Javascript required]/vh-cl/trunk/html
    
    and you have to decide whether you want to check out the complete site (a full working directory currently requires about 250 MB of disk space), or only a selected subdirectory.

A similar setup – but using separate Subversion repositories – is used for the web pages of some research groups, as well as the source files of the exam questions archive:

svn+ssh:[Javascript required]/rainbow
svn+ssh:[Javascript required]/security
svn+ssh:[Javascript required]/tripos-papers/all

Quick example for Linux users

Using the checkout wizard

Simply type

/anfs/www/tools/bin/cl-web-checkout

This interactive script will ask you to confirm

  • what svn+ssh URL you want to check out, and
  • where you would like your working directory.

It will then set up everything for you: svn checkout, creating the symbolic link needed in a partial checkout for ucampas to see the wider context of the site, and finally format everything with ucampas.

Alternatively:

Full checkout

If you would like to be able to work on all HTML parts of the website, then prepare your personal working copy under Linux using the commands

mkdir ~/public_html/cl-preview/
svn co svn+ssh:[Javascript required]/vh-cl/trunk/html@ \
  ~/public_html/cl-preview/html
ucampas -r ~/public_html/cl-preview/html

This will create a working directory inside your personal web space, where you can easily preview any changes that you make before committing them back.

Now edit some “*-b.html” file in your working directory. Then call ucampas to reformat it and preview the resulting file. When you are happy with your change, call “svn commit” to apply them to the repository.

Before you make an edit next time, remember to first call “svn update”, to make sure you work on the very latest version of the files.

Partial checkout

If, for example, you are only interested in the system-administration pages, it is sufficient to check out only

svn+ssh:[Javascript required]/vh-cl/trunk/html/local/sys

Under Linux you can do this with the commands

mkdir ~/public_html/cl-preview/
svn co svn+ssh:[Javascript required]/vh-cl/trunk/html/local/sys@ \
  ~/public_html/cl-preview/sys
ln -s /anfs/www/html/local/sys ~/public_html/cl-preview/sys/.u

The “.u” symbolic link created in the above example helps ucampas to find the navigation structure of the rest of the site in case you did not check out the whole thing (this works only on machines where /anfs/www/ is automounted, e.g. on lab-managed Linux).

Using git instead

Some people prefer the git version-control system over Subversion, for example because it provides better facilities for reviewing the revision history (gitk) or for preparing single-issue commits (git add -p). Fortunately, git is also able to act as a subversion client (man git-svn). Example:

git svn clone svn+ssh:[Javascript required]/vh-cl/trunk cl-preview
cd cl-preview
git svn rebase       # update to latest Subversion version
git svn show-ignore >.git/info/exclude   # apply svn:ignore attributes
[edit]
git status
git diff
git commit           # create local commit
git svn dcommit      # push local commits to Subversion server

Quick Subversion cheat sheet

svn updateupdate working directory to latest repository changes
svn commitcommit your changes back to the repository
svn diffwhich changes have you not yet committed back?
svn status -qwhich changes have you not yet committed back?
svn add -N fileadd a new file or directory to the repository (at next commit)
svn mv old_file new_fileuse instead of mv for files under version control
svn cp src_file dst_fileuse instead of cp for files under version control
svn rm fileuse instead of rm for files under version control
svn help commandmore cheat sheets

What files belong into the repository?

When you commit a change to some *-b.html file into the repository, the server will automatically update the /anfs/www/html working directory of the web server and then call ucampas there. So there is no need to keep any ucampas-generated *.html files in the repository. This way, ucampas-formatted pages can be updated even from computers where ucampas is not installed.

Files that belong into the repository include

  • *-b.html files
  • uconfig.txt files
  • small image files referenced by any of the above
  • small pdf or doc files referenced by any of the above (where version control seems appropriate, e.g. important departmental forms)

Caution should be exercised with adding huge files (e.g., binary software distributions, high-res image collections), in particular things that are unlikely to ever be edited in a collaborative way. We want to avoid repository bloat and hope to keep full working directories well below 300 MB in the long run. Placing large binary files directly into /anfs/www/html may sometimes be more sensible then going via the repository.

What about Windows?

Windows users can access Subversion repositories conveniently using TortoiseSVN, an easy to use extension to Windows File Explorer.

Note that TortoiseSVN will need to access your ssh key via PuTTY. For this to work, save your PuTTYgen-generated private key into your StartUp folder, from where Pageant will load it automatically and then provide it to TortoiseSVN.

Ucampas is currently not yet available for Windows (except via a hack). Windows users can preview the undecorated *-b.html files locally, by right-clicking on the *-b.html file to open it with their web browser. Once that looks fine, they can commit the file, and then use their browser again to check what ucampas made out of the file on the main site.

Some tips for Windows users

Use the UTF-8 character encoding

  • With Notepad++: in the “Encoding” menu, select “Encode in UTF-8 without BOM”

Background: Our web site uses the UTF-8 character encoding. This is different from the traditional Windows character encoding CP-1252. The difference matters only for the following characters:

€‚ƒ„…†‡ˆ‰Š‹ŒŽ
‘’“”•–—˜™š›œžŸ
 ¡¢£¤¥¦§¨©ª«¬­®¯
°±²³´µ¶·¸¹º»¼½¾¿
ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ
ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß
àáâãäåæçèéêëìíîï
ðñòóôõö÷øùúûüýþÿ

Of these, only the pound and euro signs are easily available on a UK keyboard, but the curly quotation marks and dashes commonly appear in text pasted from word-processing files. If you accidentally commit one of these characters in the CP-1252 encoding, Ucampas will complain with an error about an invalid UTF-8 sequence. Notepad++ can help you to fix this problem with its “Convert to UTF-8 without BOM” function.

Ideally, your editor also should not prefix UTF-8 files with a byte order mark (BOM, U+FEFF). Notepad++ is a suitable, good, free, HTML-aware plain-text editor for Windows that can do all that. Microsoft’s Notepad finally also gained “UTF-8 without BOM” support under Windows 10 in 2019, but Wordpad still does not.

Filenames are case sensitive

Windows does not distinguish between uppercase and lowercase characters in files. Our web server does. Unlike Windows, it considers “image.jpg” and “image.JPG” to be two different files. Windows users often fall into that trap, in particular, because Windows Explorer by default does not display known filename extensions, such as “.jpg” or “.HTML”. Preferably keep filenames lowercase.

Recommendation: Click Start | Control Panel | Appearance and Personalization | Folder Options | View. There, make sure the option “Hide extensions for known file types” is not checked. This ensures that Windows Explorer always shows you the full filename.

Automatic SVN Update

The most common mistake made by TortoiseSVN users is to forget the "SVN Update" before editing any file, therefore editing an out-of-date version, resulting in a conflict at the next "SVN Commit". The risk of this happening can be reduced by automatically calling "SVN Update" at each login.

Recommendation: Go to Start | All Programs and right-click onto the "Startup" folder to select "Explore". In this folder, create a new file "svn-update.bat" that contains the following line:

start TortoiseProc /command:update /path:"C:\Users\%username%\Desktop\cl-web" /closeonend:3

Change C:\Users\%username%\Desktop\cl-web to the location of your TortoiseSVN working folder. This will call "SVN Update" automatically each time you log in.

You can also configure the "TortoiseSVN Project Monitor" (with the repository URL svn+ssh:[Javascript required]/vh-cl) to receive a notification each time there is something to update.

What happens during and after commit

When you use the “svn commit” command to send your changes to the Subversion repository, two “hook scripts” jump into action (in /usr/groups/wwwsvn/repositories/*/hooks/).

The pre-commit script checks whether you are trying to add any files with names that are undesirable in URLs (in particular ones containing whitespace space or punctuation characters other than .-_) and aborts the commit if this was the case. You can then use “svn move” to rename your file and try the commit again. This script also limits to pagemasters the ability or create or edit files called “Makefile” or marked executable. (You can still create or edit files called “makefile” with lower-case m, which will not be executed by the post-commit script.)

The post-commit script does several things:

  • It sends email notifications of any commit to pagemaster, and also to some other people if “their” part of the site is affected.
  • It then changes its effective user-ID to “wwwupdate”
  • It checks whether any *-b.html files have been deleted and then also removes the corresponding ucampas-generated files from the website.
  • It calls “svn update” on the website, to update there any files that you have changed in the repository.
  • If a directory in which you changed a file also contains a “Makefile”, then this file will be executed with “make -f Makefile”. This mechanism can be used to update algorithmically generated content, such as web pages that are the result of database queries (example: People). Such makefiles may in addition also be executed regularly via cron and /anfs/www/VH-cl/scripts/regular-updates.
  • For any *-b.html file that you have changed, it calls ucampas immediately to update the corresponding *.html file.
  • Finally, it starts a “ucampas -r” background process to rebuild the entire site. This process finishes a few minutes after the “svn commit” command.

If you edited a *-b.html file, then the post-commit script will call ucampas immediately, to make sure that the corresponding web page has been updated by the time your “svn commit” has finished. However, some edits (in particular to uconfig.txt files and to web-page titles) can affect many other pages, in particular their navigation links. As a result, the entire website has to be rebuilt after each commit. As this can take 5–10 minutes, it is completed after the end of “svn commit”. Therefore, you may not see changes to navigation links on pages other than those which you edited for a couple of minutes.

Anything else to consider?

  • Avoid checking symbolic links into the repository. There are several reasons for this:
    • Symbolic links have no equivalent under Windows (Explorer “shortcuts” are always absolute; the symlinks that Vista added are only for administrators)
    • Symbolic links cause the same page to appear under two URLs, which irritates search-engine users.
    • End users will not see which of several URLs for the same page is not via a symlink and therefore will not understand, which URL is meant to be the official one.
    • Ucampas infers the URL from the absolute pathname of the file and generates relative URLs in its navigation information accordingly. Relative URLs can break if the page is accessed through an alias-URL via a symbolic link.

    Instead of using a symlink, add to the .htaccess file at the old location a “Redirect permanent old_path new_URL” entry, such as

    Redirect permanent /UoCCL/az.html https://www.cl.cam.ac.uk/az/
    

    This way, anyone accessing an old URL will still get instantly to the new page, but will see the new location in the browser address field and bookmark it accordingly.

  • Avoid filenames that differ only in case. The case-invariant filesystems used by Windows and macOS cannot handle in the same folder multiple files that differ only in letter case. Preferably use in filenames only lower-case letters, digits, hyphens, underscore and dot (portable filename character set). Other characters (including space) can lead to ugly URLs.
  • Use relative links to files in the same repository. This way, these links will work no matter where the repository is checked out.

    Use absolute URLs to anything outside the repository (e.g., “https://www.cl.cam.ac.uk/...” or “/...”).

  • Use the UTF-8 encoding (without BOM) in plain-text files. The ucampas tool assumes that all its input and output files are in UTF-8. Under Linux, operate your editor and xterm under the locale setting LANG=en_GB.UTF-8, which is now the default anyway.

Dealing with binary files (PDF, etc.)

It is, in principle at least, preferable to keep in the Subversion repository only human-editable plain-text source files (e.g., HTML, LaTeX), and to exclude generated large binary files, such as PDFs, MS-Office documents, tarballs, ISO images, for the following reasons:

  • such files can easily become quite large and can therefore become a burden to users who want to check out a substantial part of the tree.

    Note: PDFs become large if they include fonts and bitmap images. PDFs without images that use only standard PostScript fonts such Helvetica or Times, without embedding them, can remain quite small.

  • many binary files are not the originally edited source file, but are automatically derived from another file. In other words, they are redundant and not useful to produce an updated version of the document.
  • The diff between two versions of a binary file is rarely useful.

We have several alternative ways to deal with PDFs that logically belong to parts of the web site whose HTML files are in the repository:

  1. Just add the PDF via Subversion. This can be acceptable for occasional small and non-growing collections of PDFs, as long as they do not become a substantial fraction of the size of a typical working directory. If there are associated simple LaTeX source files, they may go into the same directory. Examples: structure of exam papers, concise timetable.
  2. Keep the LaTeX sources in the repository and cause make install to copy the resulting PDFs into the appropriate place under the server's working directory /anfs/www/html/. We used to do that for a few major departmental documents with complex source file structure, such as the Computer Science Tripos syllabus and a few other teaching documents that are maintained collaboratively. In some cases, the LaTeX sources live under
      svn+ssh:[Javascript required]/vh-cl/trunk/latex
    

    These are not visible to the web server (example: Blue Book, Pink Book, Teaching Handbook). In other cases, the LaTeX sources live in the HTML tree (example: Masters project report template source files). In the former case, make install could only be called by people for whom write access to the relevant part of the server's working directory has been arranged (via suitably chosen Unix group membership of the destination directory), in the latter case, make is called on the server by the post-commit script.

  3. Create a separate subdirectory for PDFs, which is not committed to the Subversion repository.

    There are some existing places for such directories:

    Where the PDF directory is not under the server’s working directory at /anfs/www/html, a symbolic link to it has to be created there. This is how we now handle most committee minutes and papers: /anfs/www/html/local/committees/*/minutes is a symbolic link to something like /usr/groups/deptadmin/Committees/*/public/minutes, that is near where the secretaries keep their corresponding Microsoft Office source files.

    Ucampas provides various (not yet fully documented) means for automatically inserting into a page a list of filenames, especially if the filename is an ISO 8601 date (YYYY-MM.pdf or YYYY-MM-DD.pdf), as is now common for committee minutes. These can be used to arrange that PDF files added to any directory are automatically linked from a Ucampas-formatted web page. (Ask Markus Kuhn for details.)

  4. Put PDFs into a separate repository tree
    svn+ssh:[Javascript required]/vh-cl/trunk/pdf
    
    that the web server accesses via suitably placed symbolic links in the HTML tree. This grants easy write access for Subversion users, without burdening anyone who just checks out the adjacent html subtree. (This option is not currently used at the Computer Laboratory, but Wolfson College used to handle committee minutes this way.)
  5. Avoid producing PDFs where HTML would also do. This is especially advisable where we rarely expect the document to be printed out and where good on-screen usability is more important than good paper typography. A good reason for using PDF is if the paper incarnation is more important than the online version (example: PhD thesis, paper form); a bad reason is if the originator of the document simply is more familiar with producing PDF than with producing HTML.

So there are many possible alternative ways in which PDFs and similar large binary objects can be added without cluttering HTML working directories with big files. To avoid user confusion, it may be a good idea to limit the number of different solutions actually used.