Ucampas reference manual
Ucampas (University of Cambridge web page augmentation system) is a page-formatting tool to convert simple HTML files into web pages that follow one of several provided house-style templates. It is used to maintain the Computer Laboratory main web site and can equally be used to format research-group pages, personal web pages, and websites of other organizations.
- Ucampas can auto-generate navigation links (side bar, bread crumbs, site map, etc.)
- Ucampas scans a directory tree of HTML and uconfig.txt configuration files to determine the structure of the navigation tree
- The resulting navigation tree closely follows the directory-tree structure by default, but overrides are possible
- Entries in the uconfig.txt files determine the relative position of subnodes
- Formatting parameters set in uconfig.txt files are inherited across entire subtrees
- Ucampas output files use relative links were possible, for URL mobility and to support non-HTTP previewing
- Ucampas detects certain HTML syntax errors (but is not a full HTML/SGML validator)
- Ucampas converts HTML into XHTML
- Input and output files are UTF-8 encoded
- Output files are compacted; redundant whitespace or comments are removed
- Ucampas can autogenerate content tables (like the one above)
Ucampas requires Perl 5.8.1 or newer. It should run on any Linux/Unix system, including Apple OS X. (The possibility of a Windows port is still under consideration, but don't hold your breath.)
Ucampas expects that its input files are encoded in UTF-8. Therefore, it is a good idea to use a UTF-8 locale when editing them (e.g., LANG=en_GB.UTF-8), which is anyway the default now on modern Linux. Without a UTF-8 locale, non-ASCII input characters such as “£” may cause an error and will not show up correctly on the web page.
Ucampas HTML input files should be syntactically correct. Make sure you regularly use some HTML syntax validation tool on them, such as HTML Validator (or /anfs/www/tools/bin/htmlcheck).
In the Computer Laboratory, the latest ucampas release is already installed. Just put /anfs/www/tools/ into your $PATH. (Alternatively, add symbolic links to the /anfs/www/tools/ucampas* files to a directory that is already listed in your PATH.)
You can also download and install ucampas onto your own computer:
- As a tarball: ucampas.tar.gz
- Directly from the git repository:
git clone http://www.cl.cam.ac.uk/~mgk25/git/ucampas ucampas git clone http://www.cl.cam.ac.uk/~mgk25/git/perl-PlexTree ucampas/perl-PlexTree
Check the included README.txt file for further instructions.
Ucampas reads a simple, undecorated file named something-b.html and generates from that a file named something.html. The former (*-b.html) file is intended for human editing, the latter will be served to the web browser.
The input file should be a regular HTML 4 or XHTML 1 file, such as
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <title>Document title</title> <h1>Main heading</h1> <p>Some text ... <h2>A section heading</h2>
Save this example file as test-b.html. Then type into the shell the command line
If you have not yet adjusted your PATH, type
The input file test-b.html is now processed by the HTML parser of ucampas, converted into a tree data structure, and transformed to match the house style. Finally, the transformed document tree is written out in XHTML 1 format as the file test.html, which can then be viewed with a web browser.
You can specify several files on the same ucampas command line, which will be processed faster than if you call ucampas several times, once for each file. Option -r recursively processes an entire branch of your navigation tree.
Ucampas offers some pre-processing functions for its HTML input text that can help to make life easier.
Special <div> elements
The following <div> elements, all belonging to a
ucampas-... class, will automatically be filled or
replaced by ucampas if found in the input file:
- <div class="ucampas-toc"></div>
- Fill with a table of contents. The entries of the table of content are exactly the <h1>, <h2>, ..., <h6> elements that have an id attribute value set, which will be the name of the corresponding anchor. Some house-styles may define different layout variants for the table of contents, such as <div class="ucampas-toc right"></div> to make it float on the right-hand side.
- <div class="ucampas-tabs"></div>
- Fill with a tabbed representation of the navigation tree. The
generated HTML content is a sequence of <ul> elements, one for each
row of tabs, containing one <li> for each tab. (The CSS stylesheet
used will have to make provisions to display these nicely as tabs.)
You can use several uconfig.txt parameters (see below) to control, which part of the navigation tree is to be inserted as tabs: tabstop, tabtop, tabtopinclude work just like navstop, navtop, navtopinclude, but affect the tabs inserted here, rather than the standard left-hand navigation tree.
- <div class="ucampas-include-text">'filename.txt'</div>
- Include the content of a plaintext file.
- <div class="ucampas-include-html">'filename.html'</div>
- Include the body content of an HTML file. Any <div
class="ucampas-..."> elements in the included documents will also be
processed, therefore such HTML inclusions can be nested recursively.
Note: At present (this may change), if HTML content is included from another directory, ucampas makes no attempt to correct any relative URLs found there. Included HTML documents with relative URLs should preferably reside in the same directory as the current file (or at least have an absolute path of the same length, if all relative URLs point to shared parent directories).
[Lots more exist, but still have to be documented, generating RSS feeds, navigation lists, file lists, sitemaps, etc.]
In addition, there is
- <meta name="ucampas-config" content="key=value, ...">
- The content attribute string will be interpreted as additional configuration parameters, using the same syntax used in uconfig.txt files.
Verbatim text with CDATA sections
& are part of
the HTML syntax. If they appear in plaintext to be included into a
HTML document, they need to be replaced with
&, respectively. This can be a nuisance if
your web page contains source code or similar material rich in these
characters. Ucampas offers a convenient alternative solution: it
implementings XML’s CDATA
section. In any part of a HTML input file enclosed
& will lose their
special meaning. Ucampas will then convert them for you
& in the output
file, and will also remove the
]]> delimiters, which (although part of XHTML) are
not understood by most other HTML parsers. CDATA sections cannot
include the string
]]> and cannot be nested.
<pre><![CDATA[ if (a < b) c &= 1; ]]></pre>
Configuration files (uconfig.txt)
The behaviour of ucampas can be influenced by parameter settings made in a configuration file uconfig.txt located in the same directory as the input and output HTML files.
Ucampas not only looks for a uconfig.txt file in that directory, but also in its parent directory. If such a file is found there, ucampas continues further up the directory tree for as long as it encounters an uninterrupted sequence of uconfig.txt files. The highest directory found this way becomes the root directory of the navigation tree. For the Computer Laboratory main web site, this root directory is at /anfs/www/html.
Tip: While searching for its root directory, ucampas will follow the “..” link in each directory to find the next higher parent directory, unless there exists a symbolic link “u..” pointing to a directory that should be treated as the parent node instead. This feature is useful if a subtree of a web site is located (via a symlink) elsewhere. For example, if a research group has its web site at /usr/groups/rainbow/doc/html and has at the standard place /anfs/www/html/research/rainbow just a symlink to the former, then creating a corresponding reverse symlink with “ln -s /anfs/www/html/research /usr/groups/rainbow/doc/html/u..” will still help ucampas to still find its root, even when processing files outside the main directory tree. (A future release will change this mechanism slightly, as neither the “u..” filename nor symbolic links are available on Microsoft Windows NTFS.)
Tip: After having read a uconfig.txt file, ucampas will also try to read uconfig2.txt from the same directory, and append any settings found in there to the configuration. This can be particularly useful for testing, to avoid having to modify a version controlled uconfig.txt file when some configuration parameters need to be overridden in a personal working directory (e.g., url or file_access).
The site root directory usually contains a uconfig.txt file that sets parameters that are valid for much of the site, e.g. the name of the department, the name and details of the page style to be used, or the contact details of the page maintainer:
In addition to parameter assignments, the uconfig.txt file also contains the list of file or directory names that form the subnodes of the current directory in the navigation structure. Because Unix directories are just sets of files, without any explicit relative order, and because it may not be desirable to have all files and subdirectories in a directory to automatically become visible subnodes of a web page, it is necessary to list them explicitely in the uconfig.txt file. Simply add these files separated by commas after the parameter assignments. (Ucampas will distinguish between parameter assignments and file names in this comma-separated list by looking for an “=” sign.)
The uconfig.txt files contain two types of entries: a set of key/value pair elements and a list of elements. Key/value pairs are separated by an equal sign (=), whereas list elements have no equal sign:
key1=value1, key2=value2, key3=value3, element1, element2, element3, element4
Both key/value pairs and list elements are separated by commas and optional white space (space, linefeed, etc.). A final comma will be ignored. Each key, value, or list element is a string of characters. As long as the string starts with a letter and contains only characters a-zA-Z0-9.:_- then no quotation marks are necessary, otherwise a string must be enclosed with '...' or "...".
In addition to a string, each key, value, or list element can also be augmented recursively with its own set of key/value pairs and list elements. These are appended to the string and enclosed in (...), as in:
key1=value1(key2=value2, element1), key3(element2)=value3, element3(element4)
The relative order in which entries appear in a uconfig.txt file matters only for list elements, but not for key/value pairs. Missing strings are distinguished from empty strings, that is key= and key="" do not mean the same thing. The syntax also distinguishes a special “meta string” that is preceeded by an asterisk (e.g., *link or *"link").
The list elements (outside parenthesis) in a uconfig.txt file are the names of files or subdirectories that contain subpages of the current page. The current page is here the index.html file in the same subdirectory as the uconfig.txt file. In the above example, there are five subnodes, namely the four directories (people/, research/, seminars/, techreports/) and one file (privacy.html). Where a filename is given, it is always the file served to the web browser, not any *-b.html source file.
Once ucampas has found the uconfig.txt file of the root directory of the navigation tree, it recurses into any subdirectories listed there, in order to parse and store the entire tree of uconfig.txt files that are reachable through directories listed in the respective parent uconfig.txt.
Note: When processing a node, ucampas will display the pathname of that node relative to the root directory of the navigation tree. If that node is not reachable through any sequence of subnodes listed in uconfig.txt files, then the displayed path will show any missing entry as “[...]”. If this shows up, it means that the processed node cannot be reached by a “ucampas -r” run from the root and will not be reachable through the navigation tree. Such a node is a child whose parent does not know about its existence.
Subnodes can also be specified collectively using Unix-shell regular expressions using the *glob construct. Its first argument is a filename pattern that will be expanded just like in a Unix shell. For example, a list element *glob("*-b.html") expands into any existing *-b.html file in the directory that has not yet been otherwise listed. The first slash and anything beyond in any matched pathname will be discarded, which can be used to select directories by their content. For example, *glob("d*/index-b.html") will add to the list of subnodes all existing directories that start with the letter "d" and that contain an index-b.html file. Also, any parameters listed in a *glob element will be copied into each generated subnode, which can be used, for example, as in *glob("file[0-9]-b.html", invisible=1). To reverse the sorting order of subnodes added by *glob, use *globr instead.
In addition to normal subnodes, the navigation tree can also contain arbitrary hyperlinks, which are added as lists elements of the form *link("url", title="string"). Such hyperlinks can be used to point to external sites or to break up the tree structure of the navigation information.
The key/value pairs in a uconfig.txt file are parameter settings that can be used to affect the formatting of the generated web pages or other aspects of ucampas behaviour. Some of the parameter values are just simple strings, others have a more complex structure and contain further key-value pairs and/or list elements.
A key/value pair (outside parenthesis) in a uconfig.txt file directly affects the formatting of the index.html file in this directory. However, most parameters are inherited, that is they also affect the processing of sub pages. Therefore, a key/value pair in a uconfig.txt will also affect other files and subdirectories in the same location, unless overridden further down. Parameters can also be specified in a uconfig.txt file for individual subnodes, by appending them in parenthesis to the listed subnode name, as in
There are two ways to modify the automatic inheritance of parameter settings to subnodes:
- noninherit=(key=value, ...)
- The provided parameter settings will affect only the current node, not any subnodes.
- onlyinherit=(key=value, ...)
- The provided parameter settings will affect only subnodes of the current node, not the current node itself.
The following parameters are inherited (unless otherwise specified) and are supported independent of the selected style:
- If set to 1, ucampas will compare whether an output file that is about to be written differs from an already existing file that it would overwrite. If the content of both is found to be exactly equal, ucampas will not overwrite the file and will note instead "(no change)" on the console. This option can be of use to minimize the number of files that have to be reprocessed by a backup system after a large run of “ucampas -r”.
- Specifies the shell command line that option
-d invokes in order to display (in a web browser) the web page
that has just been processed. Ucampas will perform the following
substitutions in this string:
- absolute path of the output HTML file
- path of the output HTML file, as specified on the command line
- URL of the output HTML file, as specified by the url parameter.
Example: display='firefox "%u"'
- If set to 1, ucampas will try to make sure that URLs in the output
file also work well when browsed directly via the file:// protocol, as
opposed to a HTTP/HTTPS server. The line “file_access=1,” is typically
added to a uconfig2.txt file in a local
preview working copy of a web site. This parameter should be left
unset when formatting pages visible to the
public. Option -i activates the same
mechanism. This option triggers two changes:
- Ucampas adds the suffix “index.html” to any relative URL in the output file that points to a directory. To do so, it processes the href attributes of all <a>, <link> or <area> elements. This is necessary, because normally a HTTP server translates directory URLs ending in “/” into requests for the corresponding index.html file, and without a HTTP server involved, Ucampas has to do the equivalent.
- URLs generated by Ucampas are only made as relative URLs if the resulting relative pathname points to a file that actually exists in the local filesystem, otherwise a full, absolute URL is used. This is helpful in particular when accessing a partial copy of a website via file://, but can be inconvenient for users of HTTPS, as the absolute URL might switch the protocol.
- This parameter can be used to override the title that ucampas normally automatically extracts from an HTML document and uses as a link text for the corresponding page in auto-generated navigation information. This can be used to provide a shorter version of the page title for used in navigation links, or if the file concerned is not in HTML. This parameter is not inherited.
- Selects the style template to be used for generating the output page. Currently, the two values supported are “ucam2006” and “ucam2008”, which are both based on different generations of the University web house style.
- Some styles append at the bottom of a page a line that says who made the last change to this page and when did this happen. This information is by default (svninfo=0) extracted from the ownership and last modified time of the source file. With svninfo=1 and there being a .svn subdirectory in the same directory as the source file (suggesting that the source file may be a Subversion working file), ucampas will try instead to use the "svn info" command to obtain the name of the last modifier and the date of the last modification (commit) from the Subversion repository. [This is in particular useful for setups where the subdirectory served by the web server is a Subversion working directory that is automatically updated after every commit by a post-commit hook script (usually using the commit-update.pl script that comes with ucampas).]
- This parameter can be used to set or override the title of an HTML document. It can be used, for example, to replace an inappropriate title in an automatically generated *-b.html file. It can also be used to specify the titles of all pages in uconfig.txt rather than *-b.html files, to reduce the number of files ucampas has to read to build navigation content. This parameter is not inherited. If you only want to override the title as it appears in navigation content, but not the title of the page itself, then use navtitle instead.
- An octal value that determines the “umask” setting applied by ucampas when it writes its output file. For example, “umask=0002” will ensure that the resulting file is readable to everyone and writable to both the file owner and members of the file’s group.
- Tells ucampas under which URL the current page will be accessible via HTTP. When this parameter is inherited to subnodes, the provided URL path is automatically extended according to the relative path in the file system. Therefore, this parameter has to be specified only for a root node and for the root of any subtree that is located elsewhere in the file system (e.g., via a symbolic link). Knowing the URL of each page helps ucampas to make correct use of relative and absolute URLs in auto-generated navigation information.
In addition, the following parameters (also automatically inherited) are supported if “style=ucam2006” or “style=ucam2008” has been selected; additional parameters are described in the documentation of each style:
- If set to 1, adds above the main page content a list of “bread crumb” links to higher-level nodes in the navigation tree.
- Adds additional entries at the start of the list of “bread crumb”
links generated by breadcrumbs=1. This can be useful to refer to a
higher-level node outside the current site. Example:
breadcrumbprefix=( (href="http://www.cam.ac.uk/", "University of Cambridge"), (href="http://www.cam.ac.uk/cambuniv/", "Departments") )
- Name of the copyright holder to be shown near the bottom of the page. This is by default "organization, University of Cambridge" (or simply "University of Cambridge" if organization is either not defined or is equal to the latter). Set copyright_holder="" to suppress the copyright message entirely.
- Year(s) given right after the copyright symbol shown near the bottom of the page. This is by default the year of the last-modified date of the source file.
- If set to 1, discard <style> elements from the <head> of the HTML input file. Try this if your input file was generated by an application that adds a style sheet of its own that breaks that added by ucampas.
- URL of the CGI script to be called if the user clicks on the word “modified” at the very bottom of the page. [This feature is not yet fully implemented.]
- URL with instructions for how to edit a web page. [This feature is not yet fully implemented.]
- Adds additional links at the bottom-right corner of the page. This
is meant for legal boilerplates and the like. Example:
- Sets the HTTP directory where the decorative GIF icons and images required by the page style are found, e.g. "http://www.cl.cam.ac.uk/images/".
- If set to 1, add a line at the bottom of the page that indicates who last modified it, and when when.
- If set to 1, adds left of the page content a tree of links to other nodes along the path in the navigation tree.
- If set to 1, the left-side navigation bar will not show any children of the current node. This is useful if links to the child nodes are (at that level) instead represented in some alternative fashion, for example using in-page tabs generated via <div class="ucampas-tabs">.
- If set to 1, makes the current page the top-level page of the tree of links generated by navbar=1. This can be used to detach a subsite (e.g., research-group pages) conceptually a bit better from the main site. Preferably use together with breadcrumbs=1, to keep the main site easily reachable. If set to larger values, make an ancestor of the current page the root of the displayed navigation tree: 2=parent, 3=grandparent, etc.
- If set to 1 (or any other value than "" or 0), prefixes a link to the root of the navigation tree (i.e., “home”) at the start of the navigation bar. (Some users find that this helps them more easily to find their way back to the top of the current navigation tree than via the breadcrumbs.) If the value is not "1", then this text is used as the text of link, rather than the page title (e.g.: navtopinclude="Home").
- Sets the name of the department that is displayed in a larger font near the top-right corner of the page.
- If defined, sets the name of a section of the web site that will be displayed right below the name of the department in the same font. This can be used, for example, to name the research group or other departmental section in charge of these web pages.
- Sets the HTTP directory where the CSS style sheets required by the style of the page are found, e.g. "http://www.cl.cam.ac.uk/style/".
- webmaster=(name=text, url=url, email=address)
- Provides contact details of whoever is in charge of the contents
of this page at the bottom of the page, using a (style-dependent)
phrase such as “Information provided by ...”. No all of the
sub-parameters have to be provided, for instance if the name or url
parameters are missing, they are generated automatically from the email
address. Typical usage:
webmaster=(name="Markus Kuhn", url="http://www.cl.cam.ac.uk/~mgk25/#contact"),
Warning: Do not provide an email address if you are worried about unsolicited messages. At present, no attempt is made yet to obfuscate the provided email address from the address-harvesting web spiders used by spammers. Instead, provide a URL to a web page with human-readable contact details.
The following parameters are not inherited to subnodes:
- Write in a uconfig.txt file “subnodename(invisible=1)” if you want “ucampas -r” to recurse into that path without causing it to to become visible in the navigation side-bar. However, with invisible=1 the node remains visible in the sitemap. With invisible=2, the node will not even show up in the sitemap.
- Write in a uconfig.txt file “dirname(stoprecursion=1)” if you do not want “ucampas -r” to recurse into that path (e.g., because that subdirectory is owned by somebody else).
- Set this parameter (to 1) if the current node is not a subdirectory that contains the files and directories of further subnodes, but if instead its subnodes are stored one level up (i.e., as siblings of the current node). This parameter is already implicitly set for any file ending in ".html". It can be used to keep URLs shorter than would be possible with a strict mapping between navigation-tree and subdirectory structure.
- Set this parameter (to 1) if the current node does not have an associated HTML file. Navigation links to such a node will then automatically be directed to its first subnode instead (or to the second subnode with nopage=2, etc.)
Command-line invocation and options
Ucampas is called with a list of absolute or relative path names. Any suffix “-b.html” or “.html” will be added or removed automatically, as necessary; the alternative existions “.htm” and “.php” are handled similarly. If a provided path name is that of a directory, ucampas will attempt to process the “index-b.html” file found in it. If no pathname is provided, ucampas will process “index-b.html”.
The following command line options can be specified before this argument list:
- -c 'parameters'
- Add further configuration parameters, which are formatted and have the same effect as if they were placed into the top-level uconfig.txt file.
- After a page has been generated, invoke a web browser to display it. Specify the command line to invoke the web browser using parameter display.
- --debug word,word,...
- Output some internal data structures for debugging purposes, e.g.
- nav lists the navigation tree compiled from all the read uconfig.txt files,
- src shows the SGML parse tree from the input file.
- Append “index.html” to all relative directory URLs in the output file. This option is equivalent to: -c 'file_access=1'
- Execute any <?perl ... ?> processing instructions that are found embedded in the input HTML file. By default, these are only removed. [Future documentation will describe the PlexTree API via which such Perl code can manipulate both the HTML document tree and the navigation tree.]
- Recursively process also all the subnode files listed in the
uconfig.txt file associated with each given input file. Does not
recurse into *link hyperlinks.
Tip: To make sure that “ucampas -r” processes all pages of your web site, they all need to appear in a uconfig.txt file, even if you do not want them to appear in a navigation menu. In the latter case, use the invisible parameter to hide them from the menu. (The “ucampas-navtest” tool searches for lost web pages that are not mentioned in any uconfig.txt file.)
Tip: Parameter stoprecursion blocks recursive descent into a subtree (useful, for example, if a subtree is owned by someone else).
Author, history, getting help
Ucampas was written by Markus Kuhn, primarily during the second half of 2006, in preparation of the 2007 restructuring of the lab’s web site.