[R] Advice on obscuring unique IDs in R

Anthony Staines anthony.staines at dcu.ie
Wed Jan 5 22:19:49 CET 2011


Dear colleagues,

This may be a question with a really obvious answer, but I
can't find it. I have access to a large file with real
medical record identifiers (mixed strings of characters and
numbers) in it. These represent medical events for many
thousands of people. It's important to be able to link
events for the same people.

It's much more important that the real record numbers are
strongly obscured. I'm interested in some kind of strong
one-way hash function to which I can feed the real numbers
and get back unique codes for each record  identifier fed
in. I can do this on the health service system, and I have
to do this before making further use of the data!

There is the 'digest' function, in the digest package, but
this seems to work on the whole vector of IDs, producing, in
my case, a vector with 60,000 identical entries.

H.Out$P_ID = digest(H.In$MRNr,serialize=FALSE, algo='md5')

I could do this in Perl, but I'd have to do quite a bit of
work to get it installed.

Any quick suggestions?
Anthony Staines
-- 
Anthony Staines, Professor of Health Systems Research,
School of Nursing, Dublin City University, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713



More information about the R-help mailing list