Next: Should a Health Database
Up: The DeCODE Proposal for
Previous: When are de-identified data
This leads to the reasons why I consider the security proposals made
by DeCODE to be unsatisfactory, and the level of technical expertise
shown by them so far to be inadequate.
The point that users must not have access to a Turing powerful query
language is a point that DeCODE have failed to understand; at the 12th
October briefing, it emerged that their technical expert did not even
understand the phrase `Turing powerful'. I am convinced that this is
not simply a linguistic misunderstanding, as even after I had
explained the requirement for user queries to be strictly limited, and
the difficulty of doing so, during the morning on the 12th October,
DeCODE continued to maintain at a further meeting during the afternoon
that writing a filter to police user queries would be simple.
A security expert should have been aware that this is not the case.
For example, much of the expenditure in banking computer security
relates to extensive quality control procedures whereby all programs
are examined and tested by multiple independent people, to reduce the
risk that a programmer could credit a large sum of money to his own
account. Another example comes from military computer security, where
systems prevent information flows from a higher security level to a
lower one independently of the application programs, in order to
prevent an application programmer from writing code that could leak
information. Yet another example is given by the popular `Java'
programming language, which is designed in order to let users download
programs from the Internet and run them in their web browsers with
relatively little risk that these programs could steal personal
information, destroy data or otherwise misbehave. In short, the
problem of which software one must trust, and to what extent, is the
central issue in computer security.
The other security proposals by DeCODE, and in particular the claims
made about encryption, also indicate a lack of expertise:
- it was claimed that one-way functions can be used to process social
security numbers and thus turn them into pseudonyms. However the file
of Iceland's 280,000 or so social security numbers is publicly
available, and an attacker could simply pass them through the one-way
function and build a look-up table to link numbers with pseudonyms.
When this was pointed out, DeCODE claimed that the one-way function
would involve a different key at each hospital or health centre, and
that a trusted party such as the data protection commission would then
translate these institution specific pseudonyms into nationally
uniform pseudonyms for the database. But in that case, the appropriate
mechanism would not be a one-way function, but a block cipher (the use
of a one-way function would compel the trusted party to use the key to
build a look-up table for decryption as described above);
- it was also claimed that the disease codes would be encrypted by
a public key, so that they would be coded in the database. But then
anyone could use the public key to encrypt the known ICD disease codes
giving a look-up table to decrypt the database. When this was pointed
out, DeCODE claimed that the public key encryption would include a
random number to prevent this. But then how would the codes in the
database be accessed by authorised users? We are told that the trusted
party would have the private key and decrypt them. But in that case,
again, the appropriate mechanism would not be a public key encryption
function, but a symmetric block cipher (with under 100 healthcare
providers in Iceland, the use of public key mechanisms is hard to
justify);
- most of DeCODE's presentation slides on cryptography were not
shown to me at the 12th October briefing, on the grounds that `you
know this stuff anyway'. The exception was a slide in which it is
proposed to guard against the risk of a breakthrough in cryptanalysis
by using three block ciphers (DES, IDEA and RC5) one after the
other. This idea is suggested by outsiders from time to time, but has
not appealed to professional cryptologists for many years (only if
ciphers commute can one prove that their composition is no weaker than
any of the components, and block ciphers should not commute);
- it is claimed that a separation of duty policy can be enforced
in the database, in order to prevent system administrators having
access to the full patient records, by encrypting different families
of disease codes under different symmetric keys, and by encrypting the
genealogic and genotypic databases with different keys. I am very
sceptical of this claim; having experience of designing databases
which use encryption for copy protection, I am aware of many
difficulties that need to be overcome and of which DeCODE appear
unaware. In any case, the principal issue with the database is not
encryption but how one controls the programs that are run on it and
the people who have access to the program output.
For example, I cannot accept the claim that encrypting some of the
records with different keys will prevent system administrators having
access to the database. If the decryption is performed in software,
the system administrators would have access to the keys; if it were
performed in tamper resistant hardware, they would still have access
to the plaintext whenever it was decrypted; and if all the processing
were performed in a tamper-resistant computer, then the system
administration of this computer would now become the issue.
Automating system administration might be a solution eventually but is
a long way off in practice.
For these reasons, I cannot accept DeCODE's claim to have adequate
expertise in computer security, or their claim that they do have
adequate security plans but that these have simply not been disclosed
to me [7]. The lack of competence at computer security is quite
evident in their proposal.
Next: Should a Health Database
Up: The DeCODE Proposal for
Previous: When are de-identified data
Ross Anderson
1998-10-20