Computer Laboratory > Internal information > Web servers and sites > Editing the main web site

 

Editing the Computer Laboratory main web site via Subversion

This is a brief guide on how to edit pages of the Computer Laboratory main website using the Subversion version control system. It is aimed at technically experienced users; more detailed tutorials, visit the Subversion beginners pages.

The infrastructure described here is primarily aimed at administrative staff who want to edit the main web site. It could also be extended for used by research groups to manage their own shared web space. While the mechanism described here is used on the main website with the ucampas page formatting tool, and has special support for that, it can also be used on its own, to coordinate management of arbitrary collections of files, including HTML files formatted with other tools.

How does it work?

We no longer give people direct write access to many of the subdirectories under /anfs/www/html, where our departmental web server finds the HTML pages that make up our main web site.

Instead, these web pages are kept in a "Subversion repository", a special file system that provides database-like features (atomicity, logging, event triggers, collision handling, etc.) and preserves old versions of the files that it stores.

The only way to edit files is by using Subversion client software to talk to the Subversion server that guards and manages this repository. Whenever you commit a change to a web page back into the repository, this will not only update the repository itself, but it will also update the /anfs/www/html directory where the departmental web server will then find a new file with your changes.

This approach may seem a little bit more complicated then directly editing the HTML files involved, but it has several big advantages:

  • Every change will be recorded and monitored. Every change that you make will cause an email to be sent to pagemaster (or whoever else has editorial responsibility for a part of the web site). If they do not like your change, they will usually spot and fix or undo your changes quickly. This gives the contributor peace of mind when making the edits and it gives pagemaster peace of mind when granting wider write access to pages.
  • Every change can be undone easily. The Subversion never forgets an old version. This gives contributors peace of mind, because they cannot destroy any information. If a change created a mess, it will be easy to fix it.
  • It solves Windows/Linux incompatibility problems. The way Windows and Linux handle access control permission to files differs radically. Sending all changes via the Subversion server avoids that the same file is modified by both Linux and Windows users, which avoids a lot of practical problems with permission bits and access control lists.
  • It preserves a historic record. The repository will make it easy to recreate the web site in the future as it looked a few years ago, in case that ever becomes of interest.

How do I get access?

This is a quick summary of how to become a contributor to the web site. More detailed tutorials for less experienced users are under preparation.

  1. Create an ssh key pair, if you do not already have one. You will need it to authenticate yourself to the Subversion repository server.
  2. Contact pagemaster@cl and provide the following information:
    • Which parts of the web site would you like to be able to edit?
    • For which parts of the web site would you like to be notified about any updates (automatically emailed diffs)?
    • Which ssh public key do you want to use to authenticate yourself to the Subversion server?
    • Do you require any help with the checkout of your own Subversion working directory on your computer? If so, which operating system do you use?

    Note for pagemaster: Add the crsid and public key of a new contributor to the file /usr/groups/linux/extra-packages/svnserve/sshkey/www and then call "cl-onserver --fixsvn" on some Fedora Core machine to update /usr/groups/wwwsvn/home/.ssh/authorized_keys accordingly. Then review /usr/groups/wwwsvn/repositories/vh-cl/hooks/post-commit to ensure that automatic updates and email notifications are triggered as desired as a result of Subversion commits by the new contributor. Finally, check in the web server working directory under /anfs/www/html that any file and directory that the contributor wants to update is already owned by the pseudo-user "wwwupdate", which is a prerequisite for automatic updates to happen there.

  3. When pagemaster has confirmed that everything has been set up for you, you can check out your working directory. The root URL of the web site is
      svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html
    
    and you have to decide whether you want to check out the complete site (this will require about 166 MB of disk space currently), or only a selected subdirectory.

Quick example for Linux users

If, for example, you are only interested in the system-administration pages, it is sufficient to check out only

  svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html/local/sys

Under Linux you can do this with the commands

  mkdir ~/public_html/cl-preview/
  svn co svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html/local/sys \
    ~/public_html/cl-preview/sys
  ln -s /anfs/www/html/local/sys ~/public_html/cl-preview/sys/uorigin

This will create a working directory inside your personal web space, where you can easily preview any changes that you make before committing them back. If you want to check out a large part of the site, consider moving your working directory onto /anfs/bigdisc or /local/scratch to save filer quota. The “uorigin” symbolic link created in the above example helps ucampas to find the navigation structure of the rest of the site in case you did not check out the whole thing (this works only on Lab administered machines).

Now edit some "*-b.html" file in your working directory. Then call ucampas to reformat it and preview the resulting file. When you are happy with your change, call "svn commit" to apply them to the repository. Remember that before you make an edit next time to call "svn update" to make sure you work on the very latest version of the files.

Quick Subversion cheat sheet

svn updateupdate working directory to latest repository changes
svn commitcommit your changes back to the repository
svn diffwhich changes have you not yet committed back?
svn status -qwhich changes have you not yet committed back?
svn add -N fileadd a new file or directory to the repository (at next commit)
svn mv old_file new_fileuse instead of mv for files under version control
svn cp src_file dst_fileuse instead of cp for files under version control
svn rm fileuse instead of rm for files under version control
svn help commandmore cheat sheets

What files belong into the repository?

When you commit a change to some *-b.html file into the repository, then after updating the /anfs/www/html working directory of the web server, ucampas will be called there automatically. As a result, there is no need to keep any ucampas-generated *.html files in the repository. This way, ucampas-formatted pages can be updated even from computers where ucampas is not installed.

Files that belong into the repository include

  • *-b.html files
  • uconfig.txt files
  • small image files referenced by any of the above
  • small pdf or doc files referenced by any of the above (where version control seems appropriate, e.g. important departmental forms)

Caution should be exercised with adding huge files (e.g., binary software distributions, high-res image collections), in particular things that are unlikely to ever be edited in a collaborative way. We want to avoid repository bloat and hope to keep full working directories well below 100 MB in the long run. Placing large binary files directly into /anfs/www/html may sometimes be more sensible then going via the repository.

What about Windows?

Windows users can access the repository conveniently using the TortoiseSVN frontend, an easy to use extension to the Windows Explorer GUI shell. A detailed local tutorial is under preparation.

Note that for the svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html URL to work, you have to set up in puTTY a saved session with the name "svn-www.cl.cam.ac.uk" that will open a connection to the host of the same name using the key that you sent to pagemaster.

Ucampas is currently not yet available for Windows, so Windows users have to preview the undecorated *-b.html files locally and then check after the commit what ucampas made out of them on the main site.

Subversion is able to perform CRLF↔LF conversion if files have been marked appropriately, to take care of the end-of-line differences between Windows and Unix. In spite of this, it is still a good idea to avoid NotePad and WordPad under Windows and use a specialized HTML editor instead that

  • is able to handle LF-terminated lines
  • uses the UTF-8 character set
  • does not prefix UTF-8 files with a byte-order-mark (BOM, U+FEFF)

Anything else to consider?

  • Avoid checking symbolic links into the repository. There are several reasons for this:
    • Symbolic links have no equivalent under Windows XP ("shortcuts" are always absolute). [Windows Vista appears to have something similar to symbolic links, but Subversion does not yet support them.]
    • Symbolic links cause the same page to appear under two URLs, which irritates search-engine users.
    • End users will not see which of several URLs for the same page is not via a symlink and therefore will not understand, which URL is meant to be the official one.
    • Ucampas infers the URL from the absolute pathname of the file and generates relative URLs in its navigation information accordingly. Relative URLs can break if the page is accessed through an alias-URL via a symbolic link.

    Instead of using a symlink, add to the .htaccess file at the old location a "Redirect permanent old_path new_URL" entry, such as

      Redirect permanent /UoCCL/az.html http://www.cl.cam.ac.uk/az/
    

    This way, anyone accessing an old URL will still get instantly to the new page, but will see the new location in the browser address field and bookmark it accordingly.

  • Tell Subversion to automatically assign end-of-line semantics or a MIME type to newly added files, based on their extension. This helps to avoid the CRLF problem under Windows.

    Under Linux, put the lines

      [miscellany]
      enable-auto-props = yes
      [auto-props]
      .htaccess = svn:eol-style=native
      *.html = svn:eol-style=native
      *.css = svn:eol-style=native
      *.txt = svn:eol-style=native
      *.pdf = svn:mime=application/pdf
      *.doc = svn:mime=application/msword
      *.gif = svn:mime=image/gif
      *.png = svn:mime=image/png
      *.jpg = svn:mime=image/jpg
    
    into ~/.subversion/config. Under TortoiseSVN, go to "TortoiseSVN Settings" in the extended Explorer menu and use the "Subversion configuration file: [Edit]" button there to add or uncomment the same lines.
  • Use relative links to files in the same repository. This way, these links will work no matter where the repository is checked out.

    Use full URLs to anything outside the repository (e.g., "http://www.cl.cam.ac.uk/...").

    Never use links that start with "/", because these will not work anywhere other than via the HTTP server.

  • Use the UTF-8 encoding in plain-text files. The ucampas tool assumes already that all its input and output files are in UTF-8. Pagemasters should therefore operate their editor and xterm under the locale setting LANG=en_GB.UTF-8. This is now already the default for all recent Linux distributions.

    While in principle, arbitrary Unicode characters can be used, and at least the Microsoft WGL4 repertoire is now very widely implemented, it may still be good practice to use characters outside the Latin-1 repertoire only sparingly, and take into consideration the users of text-mode browsers that have to map these to 7-bit ASCII. The repertoire of the Windows CP1252 character set (Latin-1 plus dashes, curly quotation marks, etc.) is today very widely supported with UTF-8.

Dealing with PDF files

It is, in principle at least, preferable to keep in the Subversion repository only human-editable plain-text source files (e.g., HTML, LaTeX), and to exclude automatically generated large binary files, such as PDFs, for the following reasons:

  • PDFs can easily become quite large and can therefore become a burden to users who want to check out a substantial part of the tree.

    Note: PDFs become large if they include fonts and bitmap images. PDFs without images that use only standard PostScript fonts such Helvetica or Times, without embedding them, can remain quite small.

  • PDFs are usually not the originally edited source file, but are automatically derived from another file. In other words, they are redundant and not useful to produce an updated version of the document.
  • The diff between two versions of a PDF is rarely useful.

We have several alternative ways to deal with PDFs that logically belong to parts of the web site whose HTML files are in the repository:

  1. Keep the LaTeX sources in the repository and cause make install to copy the resulting PDFs into the appropriate place under the server's working directory /anfs/www/html/. We do that at the moment for a few major departmental documents, such as the Blue Book. In some cases, the LaTeX sources live under
      svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/latex
    
    and are, therefore, not visible to the web server (example: Blue Book, Pink Book, Teaching Handbook). In other cases, the LaTeX sources live in the HTML tree (example: Diploma project examples). In both cases, make install can only be called by members of the Unix group wwwpages (who have write access to the server's working directory) or for whom write access to the relevant destination directory has been arranged otherwise.
  2. Just add the PDF via Subversion. This can be acceptable for occasional small and non-growing collections of PDFs, as long as they do not become a substantial fraction of the size of a typical working directory (example: Travel reimbursement form).
  3. Create a separate subdirectory for PDFs in the server's working directory that is not committed to the repository (example: Degree Committee minutes).
  4. Put PDFs into a separate repository tree
      svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/pdf
    
    that the web server accesses via suitably placed symbolic links in the HTML tree. (This option is not currently used at the Computer Laboratory, but Wolfson College handles committee minutes this way, for example).
  5. Avoid producing PDFs where HTML would also do. This is especially advisable where we rarely expect the document to be printed out and where good on-screen usability is more important than good paper typography. A good reason for using PDF is if the paper incarnation is more important than the online version (example: PhD thesis, paper form); a bad reason is if the originator of the document simply is more familiar with producing PDF than with producing HTML.

So there are many possible alternative ways in which PDFs and similar large binary objects can be added without cluttering HTML working directories with big files. To avoid user confusion, it may be a good idea to reduce the number of different solutions actually used.