Editing the Computer Laboratory main web site via Subversion
This is a brief guide on how to edit pages of the Computer Laboratory main website using the Subversion version control system. It is aimed at technically experienced users; for more detailed tutorials, try the Subversion beginners pages.
The infrastructure described here is primarily aimed at administrative staff who want to edit the main web site. It can also be set up for use by research groups to manage their own shared web space. While the mechanism described here is used on the main website with the ucampas page formatting tool, and has special support for that, it can also be used on its own, to coordinate management of arbitrary collections of files, including HTML files formatted with other tools.
How does it work?
We no longer give people direct write access to many of the subdirectories under /anfs/www/html, where our departmental web server finds the HTML pages that make up our main web site.
Instead, these web pages are kept in a “Subversion repository”, a special file system that provides database-like features (atomicity, logging, event triggers, conflict handling, etc.) and preserves old versions of the files that it stores.
The only way to edit these files is by using Subversion client software to talk to the Subversion server that guards and manages this repository. Whenever you commit a change to a web page back into the repository, this will not only update the repository itself, but it will also update the /anfs/www/html directory where the departmental web server will then find updated files with your changes.
This approach may seem a little bit more complicated then directly editing the HTML files involved, but it has several big advantages:
- Every change will be recorded and monitored. Every change that you make will cause an email to be sent to pagemaster (or whoever else has editorial responsibility for a part of the web site). If they do not like your change, they will usually spot and fix or undo your changes quickly. This gives the contributor peace of mind when making the edits and it gives pagemaster peace of mind when granting wider write access to pages.
- Every change can be undone easily. The Subversion never forgets old versions. This gives contributors peace of mind, because they cannot destroy any information. If a change created a mess, it will be easy to fix it.
- It solves Windows/Linux incompatibility problems. The way Windows and Linux handle access control permission to files differs radically. Sending all changes via the Subversion server avoids that the same file is modified by both Linux and Windows users, which avoids a lot of practical problems with permission bits and access control lists.
- It preserves a historic record. The repository will make it easy to recreate the web site in the future as it looked a few years ago, in case that ever becomes of interest.
How do I get access?
This is a quick summary of how to become a contributor to the web site.
- Create an ssh key pair, if you do not already have one. You will need it to authenticate yourself to the Subversion repository server.
- Contact pagemaster@cl and provide the following information:
- Which parts of the web site would you like to be able to edit?
- For which parts of the web site would you like to be notified about any updates (automatically emailed diffs)?
- Which ssh public key(s) would you like to use to authenticate yourself
to the Subversion server?
(Pagemaster can simply retrieve these from your ~/.ssh directory on the filer, but please make sure they are readable, with:
$ chmod a+rx ~/.ssh ; chmod a+r ~/.ssh/authorized_keys ~/.ssh/id_*.pubAlternatively, email your public key(s) in authorized_keys format.)
- Do you require any help with the checkout of your own Subversion working directory on your computer? If so, which operating system do you use?
Note for pagemaster: To enable a new public key accessing the repository:
- Add the crsid and public key(s) of the new contributor to /usr/groups/linux/extra-packages/svnserve/sshkey/www
- Call “cl-onserver --fixsvn && cl-onserver --fixsshcache” on sandy (to update /usr/groups/wwwsvn/home/.ssh/authorized_keys and /var/spool/sshd/wwwsvn accordingly).
- If the new contributor is interested in receiving email notifications of Subversion commits by others, then update the arguments of the commit-email.pl invocation in /usr/groups/wwwsvn/repositories/vh-cl/hooks/post-commit accordingly (via RCS).
- When pagemaster has confirmed that everything has been set up for
you, you can check out your working directory. The root URL of the
main web site is
A similar setup – but using separate Subversion repositories – is used for the web pages of some research groups, as well as the source files of the exam questions archive:
Quick example for Linux users
Using the checkout wizard
This interactive script will ask you to confirm
- what svn+ssh URL you want to check out, and
- where you would like your working directory.
It will then set up everything for you: svn checkout, creating the symbolic link needed in a partial checkout for ucampas to see the wider context of the site, and finally format everything with ucampas.
If you would like to be able to work on all HTML parts of the website, then prepare your personal working copy under Linux using the commands
This will create a working directory inside your personal web space, where you can easily preview any changes that you make before committing them back.
Now edit some “*-b.html” file in your working directory. Then call ucampas to reformat it and preview the resulting file. When you are happy with your change, call “svn commit” to apply them to the repository.
Before you make an edit next time, remember to first call “svn update”, to make sure you work on the very latest version of the files.
If, for example, you are only interested in the system-administration pages, it is sufficient to check out only
Under Linux you can do this with the commands
The “uorigin” symbolic link created in the above example helps ucampas to find the navigation structure of the rest of the site in case you did not check out the whole thing (this works only on Lab administered machines).
Quick Subversion cheat sheet
|svn update||update working directory to latest repository changes|
|svn commit||commit your changes back to the repository|
|svn diff||which changes have you not yet committed back?|
|svn status -q||which changes have you not yet committed back?|
|svn add -N file||add a new file or directory to the repository (at next commit)|
|svn mv old_file new_file||use instead of mv for files under version control|
|svn cp src_file dst_file||use instead of cp for files under version control|
|svn rm file||use instead of rm for files under version control|
|svn help command||more cheat sheets|
What files belong into the repository?
When you commit a change to some *-b.html file into the repository, the server will automatically update the /anfs/www/html working directory of the web server and then call ucampas there. So there is no need to keep any ucampas-generated *.html files in the repository. This way, ucampas-formatted pages can be updated even from computers where ucampas is not installed.
Files that belong into the repository include
- *-b.html files
- uconfig.txt files
- small image files referenced by any of the above
- small pdf or doc files referenced by any of the above (where version control seems appropriate, e.g. important departmental forms)
Caution should be exercised with adding huge files (e.g., binary software distributions, high-res image collections), in particular things that are unlikely to ever be edited in a collaborative way. We want to avoid repository bloat and hope to keep full working directories well below 300 MB in the long run. Placing large binary files directly into /anfs/www/html may sometimes be more sensible then going via the repository.
What about Windows?
Note that TortoiseSVN will need to access your ssh key via PuTTY. For this to work, save your PuTTYgen-generated private key into your StartUp folder, from where Pageant will load it automatically and then provide it to TortoiseSVN.
Ucampas is currently not yet available for Windows (except via a hack). Windows users can preview the undecorated *-b.html files locally, by right-clicking on the *-b.html file to open it with their web browser. Once that looks fine, they can commit the file, and then use their browser again to check what ucampas made out of the file on the main site.
Some tips for Windows users
Use the UTF-8 character encoding
- With Notepad++: in the “Encoding” menu, select “Encode in UTF-8 without BOM”
€‚ƒ„…†‡ˆ‰Š‹ŒŽ ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬®¯ °±²³´µ¶·¸¹º»¼½¾¿ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß àáâãäåæçèéêëìíîï ðñòóôõö÷øùúûüýþÿ
Of these, only the pound and euro signs are easily available on a UK keyboard, but the curly quotation marks and dashes commonly appear in text pasted from word-processing files. If you accidentally commit one of these characters in the CP-1252 encoding, Ucampas will complain with an error about an invalid UTF-8 sequence. Notepad++ can help you to fix this problem with its “Convert to UTF-8 without BOM” function.
Ideally, your editor also should not prefix UTF-8 files with a byte order mark (BOM, U+FEFF). Notepad++ is a suitable, good, free, HTML-aware plain-text editor for Windows that can do all that. Please avoid Microsoft's ancient Notepad or Wordpad tools.
Filenames are case sensitive
Windows does not distinguish between uppercase and lowercase characters in files. Our web server does. Unlike Windows, it considers “image.jpg” and “image.JPG” to be two different files. Windows users often fall into that trap, in particular, because Windows Explorer by default does not display known filename extensions, such as “.jpg” or “.HTML”. Preferably keep filenames lowercase.
Recommendation: Click Start | Control Panel | Appearance and Personalization | Folder Options | View. There, make sure the option “Hide extensions for known file types” is not checked. This ensures that Windows Explorer always shows you the full filename.
Automatic SVN Update
The most common mistake made by TortoiseSVN users is to forget the "SVN Update" before editing any file, therefore editing an out-of-date version, resulting in a conflict at the next "SVN Commit". The risk of this happening can be reduced by automatically calling "SVN Update" at each login.
Recommendation: Go to Start | All Programs and right-click onto the "Startup" folder to select "Explore". In this folder, create a new file "svn-update.bat" that contains the following line:
start TortoiseProc /command:update /path:"C:\Users\%username%\Desktop\cl-web" /closeonend:3
Change C:\Users\%username%\Desktop\cl-web to the location of your TortoiseSVN working folder. This will call "SVN Update" automatically each time you log in.
What happens during and after commit
When you use the “svn commit” command to send your changes to the
Subversion repository, two “hook scripts” jump into action
pre-commit script checks whether you are trying to
add any files with names that are undesirable in URLs (in particular ones
containing whitespace space or punctuation characters other than
and aborts the commit if this was the case.
You can then use “svn move” to rename your file and try the commit again.
This script also limits to pagemasters the ability or create or edit files called “Makefile” or marked executable.
(You can still create or edit files called “makefile” with lower-case m,
which will not be executed by the post-commit script.)
post-commit script does several things:
- It sends email notifications of any commit to pagemaster, and also to some other people if “their” part of the site is affected.
- It then changes its effective user-ID to “wwwupdate”
- It checks whether any *-b.html files have been deleted and then also removes the corresponding ucampas-generated files from the website.
- It calls “svn update” on the website, to update there any files that you have changed in the repository.
- If a directory in which you changed a file also contains a
“Makefile”, then this file will be executed with “
make -f Makefile”. This mechanism can be used to update algorithmically generated content, such as web pages that are the result of database queries (example: People). Such makefiles may in addition also be executed regularly via cron and
- For any *-b.html file that you have changed, it calls ucampas immediately to update the corresponding *.html file.
- Finally, it starts a “ucampas -r” background process to rebuild the entire site. This process finishes a few minutes after the “svn commit” command.
If you edited a *-b.html file, then the post-commit script will call ucampas immediately, to make sure that the corresponding web page has been updated by the time your “svn commit” has finished. However, some edits (in particular to uconfig.txt files and to web-page titles) can affect many other pages, in particular their navigation links. As a result, the entire website has to be rebuilt after each commit. As this can take 5–10 minutes, it is completed after the end of “svn commit”. Therefore, you may not see changes to navigation links on pages other than those which you edited for a couple of minutes.
Anything else to consider?
- Avoid checking symbolic links into the repository. There
are several reasons for this:
- Symbolic links have no equivalent under Windows (Explorer “shortcuts” are always absolute; the symlinks that Vista added are only for administrators)
- Symbolic links cause the same page to appear under two URLs, which irritates search-engine users.
- End users will not see which of several URLs for the same page is not via a symlink and therefore will not understand, which URL is meant to be the official one.
- Ucampas infers the URL from the absolute pathname of the file and generates relative URLs in its navigation information accordingly. Relative URLs can break if the page is accessed through an alias-URL via a symbolic link.
Instead of using a symlink, add to the .htaccess file at the old location a “Redirect permanent old_path new_URL” entry, such as
Redirect permanent /UoCCL/az.html http://www.cl.cam.ac.uk/az/
This way, anyone accessing an old URL will still get instantly to the new page, but will see the new location in the browser address field and bookmark it accordingly.
- Avoid filenames that differ only in case. Windows’ case-invariant filesystem cannot handle these. Preferably use in filenames only lower-case letters, digits, hyphens, underscore and dot. Other characters (including space) can lead to ugly URLs.
- Use relative links to files in the same repository. This
way, these links will work no matter where the repository is checked
Use absolute URLs to anything outside the repository (e.g., “http://www.cl.cam.ac.uk/...” or “//www.cl.cam.ac.uk/...” or “/...”, the latter alternatives working better with https).
- Use the UTF-8 encoding (without BOM) in plain-text files. The ucampas tool assumes that all its input and output files are in UTF-8. Under Linux, operate your editor and xterm under the locale setting LANG=en_GB.UTF-8, which is now the default anyway.
Dealing with binary files (PDF, etc.)
It is, in principle at least, preferable to keep in the Subversion repository only human-editable plain-text source files (e.g., HTML, LaTeX), and to exclude generated large binary files, such as PDFs, MS-Office documents, tarballs, ISO images, for the following reasons:
- such files can easily become quite large and can therefore become a burden
to users who want to check out a substantial part of the tree.
Note: PDFs become large if they include fonts and bitmap images. PDFs without images that use only standard PostScript fonts such Helvetica or Times, without embedding them, can remain quite small.
- many binary files are not the originally edited source file, but are automatically derived from another file. In other words, they are redundant and not useful to produce an updated version of the document.
- The diff between two versions of a binary file is rarely useful.
We have several alternative ways to deal with PDFs that logically belong to parts of the web site whose HTML files are in the repository:
- Just add the PDF via Subversion. This can be acceptable for occasional small and non-growing collections of PDFs, as long as they do not become a substantial fraction of the size of a typical working directory. If there are associated simple LaTeX source files, they may go into the same directory. Example: expense claims form, formal notices about exams.
- Keep the LaTeX sources in the repository and cause make
install to copy the resulting PDFs into the appropriate place
under the server's working directory /anfs/www/html/. We do that
at the moment for a few major departmental documents with complex
source file structure, such as
Science Tripos syllabus and a few other teaching
documents that are maintained collaboratively.
In some cases, the LaTeX sources live under
- Create a separate subdirectory for PDFs, which is not committed
to the Subversion repository.
There are some existing places for such directories:
Where the PDF directory is not under the server’s working directory at /anfs/www/html, a symbolic link to it has to be created there. This is how we now handle most committee minutes and papers: /anfs/www/html/local/committees/*/minutes is a symbolic link to something like /usr/groups/deptadmin/Committees/*/public/minutes, that is near where the secretaries keep their corresponding Microsoft Office source files.
Ucampas provides various (not yet fully documented) means for automatically inserting into a page a list of filenames, especially if the filename is an ISO 8601 date (YYYY-MM.pdf or YYYY-MM-DD.pdf), as is now common for committee minutes. These can be used to arrange that PDF files added to any directory are automatically linked from a Ucampas-formatted web page. (Ask Markus Kuhn for details.)
- Put PDFs into a separate repository tree
- Avoid producing PDFs where HTML would also do. This is especially advisable where we rarely expect the document to be printed out and where good on-screen usability is more important than good paper typography. A good reason for using PDF is if the paper incarnation is more important than the online version (example: PhD thesis, paper form); a bad reason is if the originator of the document simply is more familiar with producing PDF than with producing HTML.
So there are many possible alternative ways in which PDFs and similar large binary objects can be added without cluttering HTML working directories with big files. To avoid user confusion, it may be a good idea to limit the number of different solutions actually used.