Next: Should a Health Database Up: The DeCODE Proposal for Previous: When are de-identified data

Why the DeCODE Proposals are Inadequate

This leads to the reasons why I consider the security proposals made by DeCODE to be unsatisfactory, and the level of technical expertise shown by them so far to be inadequate.

The point that users must not have access to a Turing powerful query language is a point that DeCODE have failed to understand; at the 12th October briefing, it emerged that their technical expert did not even understand the phrase `Turing powerful'. I am convinced that this is not simply a linguistic misunderstanding, as even after I had explained the requirement for user queries to be strictly limited, and the difficulty of doing so, during the morning on the 12th October, DeCODE continued to maintain at a further meeting during the afternoon that writing a filter to police user queries would be simple.

A security expert should have been aware that this is not the case. For example, much of the expenditure in banking computer security relates to extensive quality control procedures whereby all programs are examined and tested by multiple independent people, to reduce the risk that a programmer could credit a large sum of money to his own account. Another example comes from military computer security, where systems prevent information flows from a higher security level to a lower one independently of the application programs, in order to prevent an application programmer from writing code that could leak information. Yet another example is given by the popular `Java' programming language, which is designed in order to let users download programs from the Internet and run them in their web browsers with relatively little risk that these programs could steal personal information, destroy data or otherwise misbehave. In short, the problem of which software one must trust, and to what extent, is the central issue in computer security.

The other security proposals by DeCODE, and in particular the claims made about encryption, also indicate a lack of expertise:

it was claimed that one-way functions can be used to process social security numbers and thus turn them into pseudonyms. However the file of Iceland's 280,000 or so social security numbers is publicly available, and an attacker could simply pass them through the one-way function and build a look-up table to link numbers with pseudonyms. When this was pointed out, DeCODE claimed that the one-way function would involve a different key at each hospital or health centre, and that a trusted party such as the data protection commission would then translate these institution specific pseudonyms into nationally uniform pseudonyms for the database. But in that case, the appropriate mechanism would not be a one-way function, but a block cipher (the use of a one-way function would compel the trusted party to use the key to build a look-up table for decryption as described above);
it was also claimed that the disease codes would be encrypted by a public key, so that they would be coded in the database. But then anyone could use the public key to encrypt the known ICD disease codes giving a look-up table to decrypt the database. When this was pointed out, DeCODE claimed that the public key encryption would include a random number to prevent this. But then how would the codes in the database be accessed by authorised users? We are told that the trusted party would have the private key and decrypt them. But in that case, again, the appropriate mechanism would not be a public key encryption function, but a symmetric block cipher (with under 100 healthcare providers in Iceland, the use of public key mechanisms is hard to justify);
most of DeCODE's presentation slides on cryptography were not shown to me at the 12th October briefing, on the grounds that `you know this stuff anyway'. The exception was a slide in which it is proposed to guard against the risk of a breakthrough in cryptanalysis by using three block ciphers (DES, IDEA and RC5) one after the other. This idea is suggested by outsiders from time to time, but has not appealed to professional cryptologists for many years (only if ciphers commute can one prove that their composition is no weaker than any of the components, and block ciphers should not commute);
it is claimed that a separation of duty policy can be enforced in the database, in order to prevent system administrators having access to the full patient records, by encrypting different families of disease codes under different symmetric keys, and by encrypting the genealogic and genotypic databases with different keys. I am very sceptical of this claim; having experience of designing databases which use encryption for copy protection, I am aware of many difficulties that need to be overcome and of which DeCODE appear unaware. In any case, the principal issue with the database is not encryption but how one controls the programs that are run on it and the people who have access to the program output.
For example, I cannot accept the claim that encrypting some of the records with different keys will prevent system administrators having access to the database. If the decryption is performed in software, the system administrators would have access to the keys; if it were performed in tamper resistant hardware, they would still have access to the plaintext whenever it was decrypted; and if all the processing were performed in a tamper-resistant computer, then the system administration of this computer would now become the issue. Automating system administration might be a solution eventually but is a long way off in practice.

For these reasons, I cannot accept DeCODE's claim to have adequate expertise in computer security, or their claim that they do have adequate security plans but that these have simply not been disclosed to me [7]. The lack of competence at computer security is quite evident in their proposal.

Next: Should a Health Database Up: The DeCODE Proposal for Previous: When are de-identified data

Ross Anderson
1998-10-20