Next: Conclusions Up: An Update on the Previous: Linkable After All!

The Way Forward

Even under the charitable explanation --- that the government's actions are the result of blunder rather than malice --- we face the unpleasant fact that the databases that are to support research and business information have been made identifiable by using, as the primary key, the combination of date of birth and postcode.

Quite apart from the privacy issue, this will cause both safety and reliability problems. Firstly, although most of the population can be uniquely identified in this way, a minority cannot --- twins living at the same address, for example, and students in halls of residence (for whom the capture-recapture problem in probability theory ensures that if over 23 people of the same age are living at the same address, then at least two of them are likely to share the same date of birth). Thus, if in the absence of a paper record, an accident and emergency team digs out a HES record and acts on the information it contains, then there is a small but significant probability that they will be using the wrong person's data.

Another problem is that the linkage of records will be broken when patients move. This will distort hospital readmission statistics, as it can be assumed that changes of address will be correlated with illness (assuming illness to be correlated with unemployment, divorce and homelessness).

We would therefore recommend that, as a matter of urgency, the National Health Service --- together with all its information systems contractors --- cease and desist from using (date of birth, postcode) as a primary database key.

Instead, the techniques developed in Denmark and Germany should be used. Each healthcare provider submitting data centrally should use a pseudonym, whose linkage to the patient is unknown to outsiders. For example, one might pass the name and date of birth through a hash function such as SHA1 [69], together with a key unique to the provider, and take as many bits of the result as necessary to fill the fields in question. If the use of techniques that smack of cryptography is to be forbidden, then one can simply generate the pseudonyms at random (and take care to protect the file that links them to patient identities).

Either way, the use of systematic pseudonyms would lessen the risk of the wrong record being used, and also reduce the loss of information linkage --- many address changes are local, and these patients remain with the same provider even when their postcode changes. It would also bring these systems into line with the established RCGP/GMSC guidance:

no patient should be identifiable, other than to the general practitioner, from any data sent to an external organisation without the informed consent of the patient [40]

Such simple measures will not completely solve the problem, as people with access to the databases might infer a patient's identity from knowledge of part of their clinical history --- as we pointed out in the policy. However it would eliminate the most serious problem and build a foundation on which further inference controls could be constructed (see, e.g., [29]).

It will also not tackle the problem that once large central databases exist, then there will be pressure for researchers to use these for reasons of economy. Official control of these databases then might have a negative effect on paradigm-breaking research. How readily would the establishment grant access to future scientists making unconventional claims, such as a link between Helicobacter Pylori and ulcers, or between Chlamydia and coronary heart disease?

Next: Conclusions Up: An Update on the Previous: Linkable After All!

Ross Anderson
Tue Jun 25 08:31:53 BST 1996