[R] unique and precision of long integers

Mon May 14 17:26:05 CEST 2001

On Mon, 14 May 2001, Michael Herron wrote:

>
> Hello.
>
> I have a dataset with about 500,000 observations, most of which are
> not unique.  The first 10 observations look like
>
> 901000000000100000010100101011002
> 901101101110100000010100101011002
> 901000000000100000010100000001002
> 901000000000100000010101001011002
> 901000000000100000010101010011002
> 901000000000100000010100110101002
> 901000000000100000010100101011002
> 900000000000100000010010101011002
> 901000000000100000010100101101002
> 901000000000100000010100101011002
>
> Each digit reflects a separate field, but above all spaces are
> removed.
>
> I read in the data with scan(), and then use unique() to get the

How did you read them with scan?  You seem to have doubles, despite your
title.  Reading them as integers overflows:

> foo <- scan("foo.dat", integer(0))
Read 10 items
> foo
 [1] 2147483647 2147483647 2147483647 2147483647 2147483647 2147483647
 [7] 2147483647 2147483647 2147483647 2147483647

> unique observations.  But, when I print these elements to a file I
> lose precision.  For instance, let x be a vector of the first 10

Nope, you lost it reading them into the doubles.

Why not just do this with character objects?

foo <- scan("foo.dat", "")
unique(foo)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._