By Ellis Weinberger, Cambridge University Library, Richard Clayton, Computer Laboratory, Cambridge University, and Ross Anderson, Computer Laboratory, Cambridge University, The GoalA digital repository exists to preserve the ability to use the intellectual content of the digital objects within it. For this to happen, the repository will need to preserve the object itself, but it must also preserve the metadata describing the object. This will provide the provenance of the object and indicate any access controls that need to be applied. Preservation does not stop there. For example, in order to preserve the ability to use a TIFF file, a TIFF viewer must also be preserved. To run the TIFF viewer, an operating system must be preserved. To run the operating system, the original computer hardware will need to be preserved, or, more likely, emulation software will need to be developed that allows the old hardware to be emulated on new machines. Finally, in case we need to write new viewer software, we must preserve the formal specification of the TIFF file format with sufficient subsidiary documentation that grows up around formal specifications to explain how they actually work in practice. Good vegetable ink, on good acid-free paper, stored in a cool, dry, dark room is very stable and will last a thousand years; vellum will last even longer. A digital object, stored on any kind of electronic or magnetic medium, is likely to be unusable within ten years. Either the medium will have deteriorated, or the hardware to read the medium will have disappeared, or the software to interpret the information on the medium will have become unavailable. We must expect to transfer digital objects regularly on to new platforms in order to preserve access to the intellectual content. Digital objects have become a significant part of most collections. Acquiring them and preparing them for use may be expensive, but the cost of preserving them could be even more. In order to protect the investment of the institution in digital objects, policies must be set, decisions must be made, and actions will need to be taken at the very beginning of the process. The RisksMany of the risks to digital objects are the familiar risks to any repository. Natural events like fire, flood, earthquake, insects or micro-organisms can destroy storage media as easily as they can damage books and buildings. Digital media have their own fallibilities: magnetic domains on tapes will 'print through' and the data will be irrecoverable unless the tape is regularly rewound; and CDs can be damaged by bending, the substrate may delaminate, or the laser reflecting layer may oxidise. The regular transfer of digital objects to new media is accompanied by risks of human error. And it may not even be possible at all, as publishers use ever more effective protection mechanisms to prevent unauthorised copying damaging their sales. In certain circumstances, for example cases of libel, blasphemy, or official secrets, the courts may order the modification or erasure of preserved digital objects in repositories. It may be prudent to consider ways to ensure the integrity of the repository under such circumstances. Pre-emptive steps to avoid such problems might include negotiations with the authorities to obtain legislative safeguards for repositories covered by public law, such as legal deposit libraries. In the case of private repositories, it may still be possible to place ``mirrors'' of archives in various legal jurisdictions, in an attempt to side-step any local problems. Finally, the repository itself must be protected. Funding could stop if the repository is no longer seen as serving a useful purpose, and large amounts of preserved material could be lost if the future maintenance budget is inadequate. The PolicyA security policy is a succinct statement of a system's protection strategy, from which can be derived specifications and methods of judging the quality of proposed solutions. We propose:
Preserving the Digital ObjectsThe digital object is, by definition, digital, but almost all the other material that needs to be preserved is also digital. Hence a scheme which works for the object will, in general, work for everything else associated with the object that needs to be preserved. Everything else will include software, specifications, documentation, help files and, of course, the metadata for the object and the metadata for all of this supporting material as well. The ideal scheme is to preserve multiple copies of all relevant digital objects at several different sites using varied software and hardware. The number of copies, the number of sites, and the varieties of software and hardware should be based on an examination of:
Regular testing should take place to ensure that all necessary digital objects exist and function correctly. These tests should include manual and machine procedures, and should cover all copies in all locations. Preserving Access RestrictionsMany objects in a digital repository will be subject to access controls. This is not new. Archivists have been checking security clearances and applying limits to photocopying for years. What is new is that digital objects are themselves enforcing controls, using hardware devices or access to remote authorisation sites to check that access is allowed. This means that the repository will need to preserve the ability, even in the far future, to convince the digital object that the device or the remote site still exists. This may be impractical as hardware ages, or sites disappear. It may be possible to obtain a copy of the object with the technical measures disabled, protected instead by a contract between publisher and repository. If publishers prove unwilling to provide these special copies, then repositories may have to appeal to legislators to enact legislation to mandate the provision of digital objects to repositories in a form which the repositories can preserve. If the intellectual content of the object is libellous, pornographic, or secret, then the repository would need to preserve the metadata defining the access restrictions applying to the material. Since courts may be minded to have an objectionable object destroyed, pre-emptive steps to avoid such problems might include negotiations with appropriate authorities, perhaps to obtain legislative safeguards. It may also be prudent to place 'mirrors' of repositories in various legal jurisdictions, to remove temptation from litigants. Metadata IntegrityAccess and usage control depends on accurate maintenance of metadata about staff roles, access restrictions, and access privileges. A repository is likely to be part of a larger institution e.g. a large library. End users would not have direct access to the preserved digital objects, but would deal with the staff of the host institution who in turn would deal with the staff of the repository. Staff of the repository would have varying access privileges, depending on their role in the repository. The access privileges would have to take into account legislation on intellectual property rights, freedom of information, official secrets, human rights and data protection. Access and usage controls often involves preserving not only the raw information but also preserving a record of who accessed or changed the information, their role, and reason for doing it. It is then necessary to record information about the people involved, what roles they had, what access privileges attached to that role and when they were allocated the role. The complexity of the metadata continues recursively, with records being required to indicate who allocates roles and privileges to roles. Insisting that only two or more people working together can make any changes will enhance the security of the metadata. The old metadata should always be preserved with alterations appended. Lest this be thought excessive, it should be remembered that insiders perform the majority of security compromises, and that the ability to change metadata can completely compromise the preservation of a digital object. Where the preserved records are digital, such changes can become impossible to detect unless the metadata is capable of demonstrating what changes have occurred. ConclusionsIn this paper we set out a security policy to protect the ability to use the intellectual content of digital objects preserved in a digital repository. Along with the objects themselves, it is necessary to preserve the metadata describing them and the infrastructure that permits access to the contents. There are significant challenges to be overcome where technical methods have been used to limit access to a digital object and new legislation may be needed in this area. Digital repositories will need continuing technical, economic, and political support. A repository that is merely a random collection of objects will be more likely to lose financial support than one that has a definite structure and purpose to its collection. University and legal deposit libraries have been able to preserve print and manuscript collections through the centuries and so are suitable candidates to preserve the intellectual content of digital objects for future generations and future civilisations. For further information contact: |
If you have any questions, please e-mail me.