Ucampas (University of Cambridge web page augmentation system) is a page-formatting tool to convert simple HTML files into web pages that follow one of several provided house-style templates. It is used to maintain the Computer Laboratory main web site and can equally be used to format research-group pages, personal web pages, and websites of other organizations.
Ucampas should run on any Linux/Unix/macOS system with Perl 5.16.0 or newer. (The possibility of a Windows port is still under consideration, but don't hold your breath.)
Ucampas expects that its input files are encoded in UTF-8. Therefore, it is a good idea to use a UTF-8 locale when editing them (e.g., LANG=en_GB.UTF-8), which is anyway the default now on modern Linux. Without a UTF-8 locale, non-ASCII input characters such as “£” may cause an error and will not show up correctly on the web page.
Ucampas HTML input files should be syntactically correct. Make sure you regularly use some HTML syntax validation tool on them, such as HTML Validator.
In the Computer Laboratory, the latest ucampas release is already installed. Just put /anfs/www/tools/ into your $PATH. (Alternatively, add symbolic links to the /anfs/www/tools/ucampas* files to a directory that is already listed in your PATH.)
You can also download and install ucampas onto your own computer:
git clone https://www.cl.cam.ac.uk/~mgk25/git/ucampas ucampas git clone https://www.cl.cam.ac.uk/~mgk25/git/perl-PlexTree ucampas/perl-PlexTree
Check the included README.txt file for further instructions.
Ucampas reads a simple, undecorated file named something-b.html and generates from that a file named something.html. The former (*-b.html) file is intended for human editing, the latter will be served to the web browser.
The input file should be a regular HTML 4 or XHTML 1 file, such as
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <title>Document title</title> <h1>Main heading</h1> <p>Some text ... <h2>A section heading</h2>
Save this example file as test-b.html. Then type into the shell the command line
ucampas test
If you have not yet adjusted your PATH, type
/anfs/www/tools/bin/ucampas test
The input file test-b.html is now processed by the HTML parser of ucampas, converted into a tree data structure, and transformed to match the house style. Finally, the transformed document tree is written out in XHTML 1 format as the file test.html, which can then be viewed with a web browser.
You can specify several files on the same ucampas command line, which will be processed faster than if you call ucampas several times, once for each file. Option -r recursively processes an entire branch of your navigation tree.
Ucampas offers some pre-processing functions for its HTML input text that can help to make life easier.
The following <div> elements, all belonging to a
special ucampas-
... class, will automatically be filled or
replaced by ucampas if found in the input file:
You can use several uconfig.txt parameters (see below) to control, which part of the navigation tree is to be inserted as tabs: tabstop, tabtop, tabtopinclude work just like navstop, navtop, navtopinclude, but affect the tabs inserted here, rather than the standard (e.g., left-hand) navigation tree.
[Lots more exist, but still have to be documented, generating RSS feeds, navigation lists, file lists, sitemaps, etc.]
In addition, there is
The characters <
and &
are part of
the HTML syntax. If they appear in plaintext to be included into a
HTML document, they need to be replaced with <
and &
, respectively. This can be a nuisance if
your web page contains source code or similar material rich in these
characters. Ucampas offers a convenient alternative solution: it
implements XML’s CDATA
section. In any part of a HTML input file enclosed
by <![CDATA[
and ]]>
, the
characters <
and &
will lose their
special meaning. Ucampas will then convert them for you
into <
or &
in the output
file, and will also remove the <![CDATA[
and ]]>
delimiters, which (although part of XHTML) are
not understood by most other HTML parsers. CDATA sections cannot
include the string ]]>
and cannot be nested.
Examples:
<pre><![CDATA[ if (a < b) c &= 1; ]]></pre>
<script type="text/javascript"> //<![CDATA[ if (a < b) c &= 1; //]]> </script>
The behaviour of ucampas can be influenced by parameter settings made in a configuration file uconfig.txt located in the same directory as the input and output HTML files.
Ucampas not only looks for a uconfig.txt file in that directory, but also in its parent directory. If such a file is found there, ucampas continues further up the directory tree for as long as it encounters an uninterrupted sequence of uconfig.txt files. The highest directory found this way becomes the root directory of the navigation tree. For the Computer Laboratory main web site, this root directory is at /anfs/www/html.
Tip: While searching for its root directory, ucampas will follow the “..” link in each directory to find the next higher parent directory, unless there exists a single-line file (or symbolic link) “..u” containing a directory path that should be treated as the parent node instead. This feature is useful if a subtree of a web site is located (via a symlink) elsewhere. For example, if a research group has its web site at /auto/groups/comparch/www and has at the standard place /anfs/www/html/research/comparch just a symlink to the former, then creating a corresponding reverse link with “echo /anfs/www/html/research >/usr/groups/comparch/www/..u” will still help ucampas to still find its root, even when processing files outside the main directory tree. (Making “..u” a symlink is no longer recommended: symlinks can cause problems for Windows or SMB users and they may unintentionally expose file space if web servers follow them.)
Tip: After having read a uconfig.txt file, ucampas will also try to read uconfig2.txt from the same directory, and append any settings found in there to the configuration. This can be particularly useful for testing, to avoid having to modify a version controlled uconfig.txt file when some configuration parameters need to be overridden in a personal working directory (e.g., url or file_access).
The site root directory usually contains a uconfig.txt file that sets parameters that are valid for much of the site, e.g. the name of the department, the name and details of the page style to be used, or the contact details of the page maintainer:
organization="Computer Laboratory", webmaster=(email="myemail@cl.cam.ac.uk"), style=ucam2008, breadcrumbs=1, people, research, seminars, techreports, privacy.html
Library directory: Ucampas will also look for a uconfig.txt file in its library directory. This is the directory where the ucampas executable was installed, but with any trailing “/bin” replaced with “/share/ucampas”. The uconfig.txt file there can be used for installation-wide default settings, which will be loaded before the root uconfig.txt file, and can therefore be overridden by the latter. You should only put parameters there, no subnodes.
In addition to parameter assignments, the uconfig.txt file also contains the list of file or directory names that form the subnodes of the current directory in the navigation structure. Because Unix directories are just sets of files, without any explicit relative order, and because it may not be desirable to have all files and subdirectories in a directory to automatically become visible subnodes of a web page, it is necessary to list them explicitly in the uconfig.txt file. Simply add these files separated by commas after the parameter assignments. (Ucampas will distinguish between parameter assignments and file names in this comma-separated list by looking for an “=” sign.)
The uconfig.txt files contain two types of entries: a set of key/value pair elements and a list of elements. Key/value pairs are separated by an equal sign (=), whereas list elements have no equal sign:
key1=value1, key2=value2, key3=value3, element1, element2, element3, element4
Both key/value pairs and list elements are separated by commas and optional white space (space, linefeed, etc.). A final comma will be ignored. Each key, value, or list element is a string of characters. As long as the string starts with a letter and contains only characters a-zA-Z0-9.:_- then no quotation marks are necessary, otherwise a string must be enclosed with '...' or "...".
In addition to a string, each key, value, or list element can also be augmented recursively with its own set of key/value pairs and list elements. These are appended to the string and enclosed in (...), as in:
key1=value1(key2=value2, element1), key3(element2)=value3, element3(element4)
The relative order in which entries appear in a uconfig.txt file matters only for list elements, but not for key/value pairs. Missing strings are distinguished from empty strings, that is key= and key="" do not mean the same thing. The syntax also distinguishes a special “meta string” that is preceded by an asterisk (e.g., *link or *"link").
The list elements (outside parenthesis) in a uconfig.txt file are the names of files or subdirectories that contain subpages of the current page. The current page is here the index.html file in the same subdirectory as the uconfig.txt file. In the above example, there are five subnodes, namely the four directories (people/, research/, seminars/, techreports/) and one file (privacy.html). Where a filename is given, it is always the file served to the web browser, not any *-b.html source file.
Once ucampas has found the uconfig.txt file of the root directory of the navigation tree, it recurses into any subdirectories listed there, in order to parse and store the entire tree of uconfig.txt files that are reachable through directories listed in the respective parent uconfig.txt.
Note: When processing a node, ucampas will display the pathname of that node relative to the root directory of the navigation tree. If that node is not reachable through any sequence of subnodes listed in uconfig.txt files, then the displayed path will show any missing entry as “[...]”. If this shows up, it means that the processed node cannot be reached by a “ucampas -r” run from the root and will not be reachable through the navigation tree. Such a node is a child whose parent does not know about its existence.
Subnodes can also be specified collectively using Unix-shell regular expressions using the *glob construct. Its first argument is a filename pattern that will be expanded just like in a Unix shell. For example, a list element *glob("*-b.html") expands into any existing *-b.html file in the directory that has not yet been otherwise listed. The first slash and anything beyond in any matched pathname will be discarded, which can be used to select directories by their content. For example, *glob("d*/index-b.html") will add to the list of subnodes all existing directories that start with the letter "d" and that contain an index-b.html file. Also, any parameters listed in a *glob element will be copied into each generated subnode, which can be used, for example, as in *glob("file[0-9]-b.html", invisible=1). To reverse the sorting order of subnodes added by *glob, use *globr instead.
In addition to normal subnodes, the navigation tree can also contain arbitrary hyperlinks, which are added as lists elements of the form *link("url", title="string"). Such hyperlinks can be used to point to external sites or to break up the tree structure of the navigation information.
The key/value pairs in a uconfig.txt file are parameter settings that can be used to affect the formatting of the generated web pages or other aspects of ucampas behaviour. Some of the parameter values are just simple strings, others have a more complex structure and contain further key-value pairs and/or list elements.
A key/value pair (outside parenthesis) in a uconfig.txt file directly affects the formatting of the index.html file in this directory. However, most parameters are inherited, that is they also affect the processing of sub pages. Therefore, a key/value pair in a uconfig.txt will also affect other files and subdirectories in the same location, unless overridden further down. Parameters can also be specified in a uconfig.txt file for individual subnodes, by appending them in parenthesis to the listed subnode name, as in
techreports(webmaster=(email="techreports@cl.cam.ac.uk")), privacy.html(navbar=0, breakcrumbs=0), policy.pdf(navtitle="Web policy")
There are two ways to modify the automatic inheritance of parameter settings to subnodes:
The following parameters are inherited (unless otherwise specified) and are supported independent of the selected style:
add_head = ( *meta(charset=utf8), *link(href="style/blue.css", rel="stylesheet", type="text/css") ),
Example: display='firefox "%u"'
<Files "*-b.html"> Require valid-user </Files>
In addition, the following parameters (also automatically inherited) are supported if “style=ucam2006”, “style=ucam2008”, or “style=ucam2012” has been selected; additional parameters are described in the documentation of each style:
breadcrumbprefix=( *a(href="http://www.cam.ac.uk/", "University of Cambridge"), *a(href="http://www.cam.ac.uk/cambuniv/", "Departments") )
footlinks=( *a(href="https://www.cl.cam.ac.uk/privacy.html", "Privacy policy"), )
webmaster=(name="Markus Kuhn", url="https://www.cl.cam.ac.uk/~mgk25/#contact"),
Warning: Providing an email address on the web can attract a never-ending flood of unsolicited marketing messages (“spam”). To reduce the risk of an email address being collected by automated address-harvesting web crawlers, either activate option obfuscate_email or provide only a URL to a web page with human-readable contact details.
The following parameters are not inherited to subnodes:
The style=stylename parameter selects which style template ucampas uses to format a page (and its sub pages). Ucampas comes with some predefined templates, but you can also add your own (although the Perl API for doing so is not yet documented). A template consists a pair of files, which are typically called either “stylename.html” and “stylename.pl”, or alternatively “stylename/template.html” and “stylename/template.pl”. Both files must be in the same directory. Ucampas looks for these by default in a subdirectory “templates/” of its library directory. You can override where ucampas looks for the template files for each style=stylename by adding to uconfig.txt a parameter:
stylename={ template='path', }
Set the above to the relative or absolute path of the template files, but drop any trailing “.html”, “.pl”, “/template.html”, or “/template.pl”. You can use command-line option -T to restrict which directories ucampas is allowed to load templates from (such that contributors able to edit uconfig.txt files cannot inject unauthorized Perl commands by adding new templates).
Style templates usually refer to other files that contain additional assets required to implement a style: images, style sheets, javascript libraries. The location of these can be specified with
If you prefer to keep different asset classes in different directories, you can use instead the following parameters, which override any default set by style_url:
A complex web site might use different styles for different parts of the site, and to allow you to keep the assets associated with each style in separate directories, ucampas also allows you to specify the above parameters for just one style, by wrapping them in a parameter that has the name of the respective style. Example:
ucam2008={ stylesheets_url="https://www.cl.cam.ac.uk/style/", images_url="https://www.cl.cam.ac.uk/images/", },
When looking for the asset URL to use, for example for image files for the ucam2008 style, ucampas actually looks along the uconfig.txt path to the root for any of the following:
It stops at the first of these that exists. Likewise for stylesheets_url and javascript_url.
Ucampas is called with a list of absolute or relative path names. Any suffix “-b.html” or “.html” will be added or removed automatically, as necessary; the alternative extensions “.htm” and “.php” are handled similarly. If a provided path name is that of a directory, ucampas will attempt to process the “index-b.html” file found in it. If no pathname is provided, ucampas will process “index-b.html”.
The following command line options can be specified before this argument list:
Tip: To make sure that “ucampas -r” processes all pages of your web site, they all need to appear in a uconfig.txt file, even if you do not want them to appear in a navigation menu. In the latter case, use the invisible or navstop parameters to hide them from the menu. (Use the “ucampas-navtest” tool occasionally to search for orphaned web pages that are not mentioned in any uconfig.txt file, and would therefore never be refreshed by “ucampas -r”.)
Tip: Parameter stoprecursion blocks recursive descent into a subtree (useful, for example, if a subtree is owned by someone else).
Ucampas was written by Markus Kuhn, primarily during the second half of 2006, in preparation of the 2007 restructuring of the lab’s web site.
Contact Markus if you have any questions or suggestions, or join the mailing list cl-ucampas.