[R] algorithm to create unique identifiers

Henrik Bengtsson hb at stat.berkeley.edu
Fri Sep 5 05:57:39 CEST 2008


On Thu, Sep 4, 2008 at 8:44 PM, Ralph S. <ruffel1 at hotmail.com> wrote:
>
> Hi all,
>
> I am trying to create a unique identifier for each row, combining numbers from three columns.
>
> Do you know if there is a general formula to do this (or some manual where I can read about this)?
>
> I figure I can use the numeric entries of the columns as "coordinates" and multiply them with different coefficients (different magnitudes) to get the unique ID - but it would be nice to read about such algorithms in general.

What are you numbers?  Are they in a fixed range?  Integers or reals?
If fixed range integers, it is easy.  Think regular numerical
representation, e.g. binary, octadecimal, decimal and hexadecimal.

For a more generic solution that works with any data types, see e.g.
MD5 [http://en.wikipedia.org/wiki/MD5].  It is not guaranteed to
generated unique codes, but it is extremely rare that two different
inputs gives the same MD5 code.  MD5 (and others) are implemented in
the 'digest' packages, e.g.

> library(digest)
> digest(list(a=1, b=list(1:10, c=letters)))
[1] "73e0ae066a97bfff7f79d41c65b55fde"

My $.02

/Henrik


>
> Any links/input would be great -
>
> Ralph
>
> _________________________________________________________________
>
> e.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list