Editing the Computer Laboratory main web site via Subversion
This is a brief guide on how to edit pages of the Computer Laboratory main website
using the Subversion
version control system. It is aimed at technically experienced users;
more detailed tutorials, visit the Subversion beginners pages.
The infrastructure described here is primarily aimed at
administrative staff who want to edit the main web site. It could also
be extended for used by research groups to manage their own shared web
space. While the mechanism described here is used on the main website
with the ucampas page formatting tool, and
has special support for that, it can also be used on its own, to
coordinate management of arbitrary collections of files, including
HTML files formatted with other tools.
How does it work?
We no longer give people direct write access to many of the
subdirectories under /anfs/www/html, where our departmental web server
finds the HTML pages that make up our main web site.
Instead, these web pages are kept in a "Subversion repository", a
special file system that provides database-like features (atomicity,
logging, event triggers, collision handling, etc.) and preserves old
versions of the files that it stores.
The only way to edit files is by using Subversion client software
to talk to the Subversion server that guards and manages this
repository. Whenever you commit a change to a web page back
into the repository, this will not only update the repository itself,
but it will also update the /anfs/www/html directory where the
departmental web server will then find a new file with your changes.
This approach may seem a little bit more complicated then directly
editing the HTML files involved, but it has several big advantages:
- Every change will be recorded and monitored. Every change
that you make will cause an email to be sent to pagemaster (or whoever
else has editorial responsibility for a part of the web site). If they
do not like your change, they will usually spot and fix or undo your
changes quickly. This gives the contributor peace of mind when making
the edits and it gives pagemaster peace of mind when granting wider
write access to pages.
- Every change can be undone easily. The Subversion never
forgets an old version. This gives contributors peace of mind, because
they cannot destroy any information. If a change created a mess, it
will be easy to fix it.
- It solves Windows/Linux incompatibility problems. The way
Windows and Linux handle access control permission to files differs
radically. Sending all changes via the Subversion server avoids that
the same file is modified by both Linux and Windows users, which
avoids a lot of practical problems with permission bits and access
control lists.
- It preserves a historic record. The repository will make it
easy to recreate the web site in the future as it looked a few years
ago, in case that ever becomes of interest.
How do I get access?
This is a quick summary of how to become a contributor to the web
site. More detailed tutorials for less experienced users are under
preparation.
- Create an ssh key pair, if you do not already have one. You will
need it to authenticate yourself to the Subversion repository server.
- Contact pagemaster@cl and provide the following information:
- Which parts of the web site would you like to be able to edit?
- For which parts of the web site would you like to be notified
about any updates (automatically emailed diffs)?
- Which ssh public key do you want to use to authenticate yourself
to the Subversion server?
- Do you require any help with the checkout of your own Subversion
working directory on your computer? If so, which operating system do
you use?
Note for pagemaster: Add the crsid and
public key of a new contributor to the file
/usr/groups/linux/extra-packages/svnserve/sshkey/www and then call
"cl-onserver --fixsvn" on some Fedora Core machine to update
/usr/groups/wwwsvn/home/.ssh/authorized_keys accordingly. Then review
/usr/groups/wwwsvn/repositories/vh-cl/hooks/post-commit to ensure that
automatic updates and email notifications are triggered as desired as
a result of Subversion commits by the new contributor. Finally, check
in the web server working directory under /anfs/www/html that any file
and directory that the contributor wants to update is already owned by
the pseudo-user "wwwupdate", which is a prerequisite for automatic
updates to happen there.
- When pagemaster has confirmed that everything has been set up for
you, you can check out your working directory. The root URL of the web
site is
svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html
and you have to decide whether you want to check out the complete site
(this will require about 166 MB of disk space currently), or only
a selected subdirectory.
Quick example for Linux users
If, for example, you are only interested in the
system-administration pages, it is sufficient to check out only
svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html/local/sys
Under Linux you can do this with the commands
mkdir ~/public_html/cl-preview/
svn co svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html/local/sys \
~/public_html/cl-preview/sys
ln -s /anfs/www/html/local/sys ~/public_html/cl-preview/sys/uorigin
This will create a working directory inside your personal web
space, where you can easily preview any changes that you make before
committing them back. If you want to check out a large part of the
site, consider moving your working directory onto /anfs/bigdisc or
/local/scratch to save filer quota. The “uorigin” symbolic link
created in the above example helps ucampas
to find the navigation structure of the rest of the site in case you
did not check out the whole thing (this works only on Lab administered
machines).
Now edit some "*-b.html" file in your working directory. Then call
ucampas to reformat it and preview the resulting file. When you are
happy with your change, call "svn commit" to apply them to the
repository. Remember that before you make an edit next time to call
"svn update" to make sure you work on the very latest version of the
files.
Quick Subversion cheat sheet
| svn update | update working directory to latest repository changes
| | svn commit | commit your changes back to the repository
| | svn diff | which changes have you not yet committed back?
| | svn status -q | which changes have you not yet committed back?
| | svn add -N file | add a new file or directory to the repository (at next commit)
| | svn mv old_file new_file | use instead of mv for files under version control
| | svn cp src_file dst_file | use instead of cp for files under version control
| | svn rm file | use instead of rm for files under version control
| | svn help command | more cheat sheets
|
What files belong into the repository?
When you commit a change to some *-b.html file into the repository,
then after updating the /anfs/www/html working directory of the web
server, ucampas will be called there automatically. As a result, there
is no need to keep any ucampas-generated *.html files in the
repository. This way, ucampas-formatted pages can be updated even from
computers where ucampas is not installed.
Files that belong into the repository include
- *-b.html files
- uconfig.txt files
- small image files referenced by any of the above
- small pdf or doc files referenced by any of the above
(where version control seems appropriate,
e.g. important departmental forms)
Caution should be exercised with adding huge files (e.g., binary
software distributions, high-res image collections), in particular
things that are unlikely to ever be edited in a collaborative way. We
want to avoid repository bloat and hope to keep full working
directories well below 100 MB in the long run. Placing large binary
files directly into /anfs/www/html may sometimes be more sensible then
going via the repository.
What about Windows?
Windows users can access the repository conveniently using the TortoiseSVN frontend, an
easy to use extension to the Windows Explorer GUI shell. A detailed
local tutorial is under preparation.
Note that for the
svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/html URL to work,
you have to set up in puTTY a saved session with the name
"svn-www.cl.cam.ac.uk" that will open a connection to the host of the
same name using the key that you sent to pagemaster.
Ucampas is currently not yet available for Windows, so Windows
users have to preview the undecorated *-b.html files locally and then
check after the commit what ucampas made out of them on the main site.
Subversion is able to perform CRLF↔LF conversion if files have
been marked appropriately, to take care of the end-of-line differences
between Windows and Unix. In spite of this, it is still a good idea to
avoid NotePad and WordPad under Windows and use a specialized HTML
editor instead that
- is able to handle LF-terminated lines
- uses the UTF-8 character set
- does not prefix UTF-8 files with a byte-order-mark (BOM, U+FEFF)
Anything else to consider?
- Avoid checking symbolic links into the repository. There
are several reasons for this:
- Symbolic links have no equivalent under Windows XP ("shortcuts" are
always absolute). [Windows Vista appears to have something similar to
symbolic links, but Subversion does not yet support them.]
- Symbolic links cause the same page to appear under two URLs, which
irritates search-engine users.
- End users will not see which of several URLs for the same page is
not via a symlink and therefore will not understand, which URL is
meant to be the official one.
- Ucampas infers the URL from the absolute pathname of the file and
generates relative URLs in its navigation information accordingly.
Relative URLs can break if the page is accessed through an alias-URL
via a symbolic link.
Instead of using a symlink, add to the .htaccess file at the old
location a "Redirect permanent old_path new_URL"
entry, such as
Redirect permanent /UoCCL/az.html http://www.cl.cam.ac.uk/az/
This way, anyone accessing an old URL will still get instantly to the
new page, but will see the new location in the browser address field
and bookmark it accordingly.
- Tell Subversion to automatically assign end-of-line
semantics or a MIME type to newly added files, based on their
extension. This helps to avoid the CRLF problem under Windows.
Under Linux, put the lines
[miscellany]
enable-auto-props = yes
[auto-props]
.htaccess = svn:eol-style=native
*.html = svn:eol-style=native
*.css = svn:eol-style=native
*.txt = svn:eol-style=native
*.pdf = svn:mime=application/pdf
*.doc = svn:mime=application/msword
*.gif = svn:mime=image/gif
*.png = svn:mime=image/png
*.jpg = svn:mime=image/jpg
into ~/.subversion/config. Under TortoiseSVN, go to "TortoiseSVN
Settings" in the extended Explorer menu and use the "Subversion
configuration file: [Edit]" button there to add or uncomment the same
lines.
- Use relative links to files in the same repository.
This way, these links will work no matter where the repository is
checked out.
Use full URLs to anything outside the repository (e.g.,
"http://www.cl.cam.ac.uk/...").
Never use links that start with "/", because these will not
work anywhere other than via the HTTP server.
- Use the UTF-8 encoding in plain-text files. The ucampas
tool assumes already that all its input and output files are in UTF-8.
Pagemasters should therefore operate their editor and xterm under the
locale setting LANG=en_GB.UTF-8. This is now already the default for
all recent Linux distributions.
While in principle, arbitrary Unicode characters can be used, and
at least the Microsoft WGL4 repertoire is now very widely implemented,
it may still be good practice to use characters outside the Latin-1
repertoire only sparingly, and take into consideration the users of
text-mode browsers that have to map these to 7-bit ASCII. The
repertoire of the Windows CP1252 character set (Latin-1 plus dashes,
curly quotation marks, etc.) is today very widely supported with
UTF-8.
Dealing with PDF files
It is, in principle at least, preferable to keep in the Subversion
repository only human-editable plain-text source files (e.g., HTML, LaTeX),
and to exclude automatically generated large binary files, such as PDFs,
for the following reasons:
We have several alternative ways to deal with PDFs that logically belong to
parts of the web site whose HTML files are in the repository:
- Keep the LaTeX sources in the repository and
cause make install to copy the resulting PDFs into the
appropriate place under the server's working directory /anfs/www/html/.
We do that at the moment for a few major departmental documents, such as the
Blue Book.
In some cases, the LaTeX sources live under
svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/latex
and are, therefore, not visible to the web server (example: Blue Book,
Pink Book, Teaching Handbook).
In other cases, the LaTeX sources live in the HTML tree (example:
Diploma project examples).
In both cases, make install can only be called by members of
the Unix group wwwpages (who have write access to
the server's working directory) or for whom write access to the
relevant destination directory has been arranged otherwise.
- Just add the PDF via Subversion.
This can be acceptable for occasional small and non-growing collections
of PDFs, as long as they do not become a substantial fraction of the size
of a typical working directory (example: Travel reimbursement form).
- Create a separate subdirectory for PDFs in the server's
working directory that is not committed to the repository (example:
Degree Committee minutes).
- Put PDFs into a separate repository tree
svn+ssh://wwwsvn@svn-www.cl.cam.ac.uk/vh-cl/trunk/pdf
that the web server accesses via suitably placed symbolic links
in the HTML tree. (This option is not currently used at the Computer
Laboratory, but Wolfson College handles committee minutes this way,
for example).
- Avoid producing PDFs where HTML would also do.
This is especially advisable where we rarely expect the document
to be printed out and where good on-screen usability is more
important than good paper typography.
A good reason for using PDF is if the paper incarnation is more
important than the online version (example: PhD thesis, paper form);
a bad reason is if the originator of the document simply is more
familiar with producing PDF than with producing HTML.
So there are many possible alternative ways in which PDFs and similar
large binary objects can be added without cluttering HTML working directories
with big files.
To avoid user confusion, it may be a good idea to reduce the number
of different solutions actually used.
|