Administrative databases and web services
A discussion paper for
the Computer
Laboratory Admin Support Panel
Markus Kuhn
Status quo
The Computer Laboratory has for many years operated a Microsoft SQL
Server that hosts a database called "Administration". It currently
runs Microsoft SQL Server 2008 on FARRINGDON
(farringdon.ad.cl.cam.ac.uk). The "Administration" database contains
currently over 140 tables, including information about
- members of the department (room occupancy, employment status, etc.)
- equipment (wiring, telephones, computers, DHCP, etc.)
- door access control
- research students (status, topic, dates, supervisor, etc.)
- applicants (mphil, phd)
- departmental roles and their occupants (under development)
- ... and much more ...
The contents and configuration of databases on this server can be
browsed and configured by any interested staff using the Microsoft
SQL Server Management Studio software, which is installed, for
example, on the Windows terminal server ts01.ad.cl.cam.ac.uk. Graham
Titmus administers the SQL Server and can grant interested staff
access permissions.
Existing users:
- Graham Titmus implemented a number of applications and web
interfaces on
https://dbserv.ad.cl.cam.ac.uk/Administration/ to some of these tables, using ASP and Visual Basic technology.
(The source code resides on the filer in the Subversion repository
file:///usr/groups/wwwsvn/repositories/msdbweb.) These are widely used
within sys-admin and receiption.
- Student-admin routinely edit some of these tables using Microsoft
Access.
- There are a number of Perl scripts running under Linux that use
the FreeTDS package to execute SQL
instructions on the SQL Server. That is for example how we generate the
departmental phone
directory (in /anfs/www/VH-cl/scripts/people.pl).
- ... and more ...
Shortcomings of the existing SQL server arrangements:
- The existing "Administration" database tables are of varying
quality. Some define authoritatively the department’s knowledge or
status in a certain area, have been carefully normalized and their
configuration automatically enforces a number of consistency rules.
Others are more used as informal tabular notebooks, are entirely
manually maintained by a single user in a single table, with no
normalization
or insufficient consistency checking. The latter are difficult to use
by software applications.
- Most of the existing documentation is “in Graham’s head”. There
are some notes in
the sys-admin
wiki; in particular the tables related to human resources, wiring,
DHCP and LDAP have some information, but again most of that is more a
collection of brief notes, rather than detailed and definitive
descriptions of all the rules, constraints, policies, applications and
administrative processes relating to that data.
- In particular, there is no central registry of who uses what data
and what applications would have to be modified if some table or
column were to be changed.
- Microsoft's SQL Server uses a
proprietary, only
very recently documented communications protocol
(TDS)
that is not
yet well supported on non-Microsoft environments; the official
interface is the Windows ODBC API. The
available FreeTDS package is a
prototype communications library for this proprietary protocol with
many known shortcomings and pitfalls, especially regarding binary
data, character sets, date and time formats, and authentication. It
works in some of our existing applications so far, but still falls
short of being a desireable long-term solution.
- While the SQL Server technology appears to be considered by
sys-admin quite robust and suitable (with the exception of Linux
connectivity), the existing Visual Basic + ASP based web applications
are considered to be not quite as flexible and comfortable as would be
desirable, giving users only very limited search and browsing access
and often rather cumbersome editing functions. It also does not seem
to work well with the
University's Raven
authentication system, requires a separate Microsoft web server
(rather than the
open-source Apache server that
we use everywhere else), and only a single member of staff is
currently familiar with it.
Requirements and available technologies
Data storage and exchange
Machine-readable data shared between human users and automatically
running software applications can and is being maintained in the
department in a number of ways:
- tables on the main departmental Microsoft SQL Server
- tables on other SQL servers (MySQL, PostgreSQL)
- tables maintained by an SQL library
(sqlite) on the filer
- plain-text files stored in a version-control repository on the
filer (subversion)
- plain-text file stored directly on the filer
- Excel spread sheets stored directly on the filer
Using a dedicated storage server process (whether SQL or
Subversion) has particular advantages for some applications:
- transaction semantics, high concurrency
- transactions can trigger actions
- reliable undo history
As we may not have the resources to develop custom-made editors for
each data set, an important requirement is also that generic tools are
available for directly browsing and editing data. With relational
databases systems, these tools should in particular allow users to
edit table joins, such that the normalization of data does not hinder
direct access by human users.
Some existing users are very comfortable with Microsoft Access as a
front-end, others strongly prefer plain-text files edited with e.g.
emacs.
Web front-ends
Over the last few years, a number of very sophisticated and
comfortable web
application frameworks have emerged for the development of web
applications and services, in particular those that interact closely
with relational databases. These include usually:
- Object-relational
mapper (ORM) – abstracts from SQL tables, represents data as native
programming-language objects, and automatically converts access to
these back into SQL transactions.
- Model-view-controller
(MVC) architecture – a standardized way of structuring a web
application that clearly separates the actual handing of the data and
the "business logic" that applies to it (model) from the generation of
the HTML interface (view) and the dispatch of incoming URL requests to
the corresponding model and view code (controller). The approach
avoids duplication of code (several views can share models, several
applications can share models and a controller) and a standardized
code structure can encourage collaborative development and code reuse.
- Template
engine – allows to augment HTML web pages with programming
language constructs, automatically prefill HTML forms, etc.
- Session manager – preserve state across a session of HTTP
requests.
- Tools that autogenerate much of the initially required code for
database web front-ends, to which the developer can then add
additional functionality (e.g., further consistency checks, on-screen
documentation).
Ruby on Rails was perhaps the
first widely used MVC-based framework that integrated many of these
techniques into a single package. This approach has since been ported
to many other programming languages, such
as Pylons and
Django under Python and
Catalyst under Perl.
They all promise to minimize the work involved in designing new
database web front-ends and simplify collaboration on such projects by
making the structure of such applications very predictable.
Pylons looks particularly
promising, firstly because the underlying programming language
(Python) is very widely used already in the department, and secondly
because it follows a rather modular approach and allows a choice of
components from separately maintained Python libraries. Python's
standardized
Web Server Gateway
Interface (WSGI) makes this modular approach particularly
practical. Pylons is normally used with
the SQLAlchemy database
toolkit, which offers a very flexible ORM.
Others considered: Perl would seem to have the
advantage of having been a widely used sys-admin tool for two decades.
But the only state-of-the-art Perl framework around, Catalyst, uses
very advanced OO features of Perl, turning it essentially into a
different language from what sys-admin are accustomed to. This
substantially reduces the advantage of not having to learn a new
language. Django for Python is an all-in-one solution that leaves less
flexibility in case we don't like one particular component, e.g. a
templating engine that does not work nicely with ucampas. There are
lots of others, but given that it is usually more work to first
familiarize oneself with a framework than to then develop a medium
complexity application, we are unlikely to be able to look at many and
have to rely on the recommendations received from experienced users in
the department, who named mostly the above candidates.
Suggested action items
- Familiarization with candidate MVC/ORM web application frameworks,
probably starting with Pylons. In
particular pay attention to interaction between template engine
and ucampas
for easy application of house styles.
- Development of 1–2 demonstration applications, probably:
- Departmental roles
and role-occupancy periods
- Circulating and evaluating PhD applicantions
(browse, search, fill out evaluation form, discussion,
track candidates with email notification of updates, in the
first instance without any outside-department interface)
(One small and one medium complex application, both immediately
useful and largely independent of existing applications.)
- Finding a solution for improved Linux connectivity to the
Microsoft SQL Server (in particular
to Python
and SQLAlchemy), which is a
precondition on its use in the many new teaching-related web
applications that we will have to implement in the foreseeable future.
Candidate database connectivity library stacks:
See also:
- Explore mod_wsgi,
a recommended way of running production Python web applications with Apache
web servers.
(For prototyping, we can use some Python webserver or existing
Apache CGI facilities with a WSGI wrapper.)
- Start thinking about better documentation of our databases and
associated processes.
- Get together small technical working groups that start looking
after individual applications, workflows, table sets, and their
documentation, probably starting with various student-admin tables.
- Explore Linux alternatives to MS Access as generic table editors
(e.g., rekall,
open
office)