[R] Error in which()

David Winsemius dwinsemius at comcast.net
Thu Jul 8 22:10:05 CEST 2010


On Jul 8, 2010, at 3:23 PM, Muhammad Rahiz wrote:

> Hi all,
>
> I'm trying to filter data into respective numbers. For example, if  
> the data ranges from 0 to <0.1, group the data. And so on for the  
> rest of the data.
> There are inconsistencies in the output. For example, b1[[3]] lumps  
> all the 0.2s and 0.3s together while 0.6s are not in the output.

Any time you are working with floating point numbers you should be  
using all.equal rather than ==. You could easily be getting bitten by  
a test for >= that declares this to be FALSE when you expected it to  
be TRUE
>
> Running the function - table(f1) - shows that each of the components/ 
> numbers has x number of elements in them. But this is not showing in  
> the results of the script.
>
> Can anyone assist?
>
>
> Thanks,
>
> Muhammad
>
>
>
>
> f1 <- read.table("data.txt")
> f1 <- f1[which(is.na(f1)==FALSE),1]

f1 is a data.frame and "[which( ==FALSE), "  is same as "[ !is.na() ,  
" so could use

f1 <- f1[ !is.na(f1[,1]), 1]1]
>
> x0 <- seq(0,1,0.1)
> x1 <- x0 +0.1
>
> b1 <- c()
> for (a in 1:length(x)){
> b1[[a]] <- f1[which(f1 >= x0[a] & f1 < x1[a])]
> }

That was really not a minimal example, now was it? Used a very small  
fraction of your data.

For me this throws an error since x is not defined. Modifying it so x  
becomes x0 and adding the column number "1" to f1's indexing gets me  
something like what you are describing. It's undoubtedly a case of FAQ  
7.31

 > b2 <-findInterval(f1[,1], seq(0, 1, by=0.1) )
 > str(b2)
  int [1:120] 11 10 9 10 10 7 10 9 9 7 ...
 > table(b2)
b2
  2  3  5  6  7  9 10 11
  1 15 17 56 21  5  4  1
 > table(f1[,1])

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9   1
   1   4  11  17  18  38  21   5   4   1

Notice that the 0.5 and 0.6es get lumped into the same box. Methods  
for discrete variables are more appropriate here. However, if you know  
that you numbers are all rounded to the nearest tenth, then add (or  
subtract) 0.05 to your boundary criteria so you won't run into  
numerical representation problems. (See below. I'm not sure that cut()  
will solve your troubles here.)

 > table(cut(f1[,], seq(0,1,by=0.1) , include.lowest=TRUE,  
right=FALSE ))

   [0,0.1) [0.1,0.2) [0.2,0.3) [0.3,0.4) [0.4,0.5) [0.5,0.6) [0.6,0.7)  
[0.7,0.8)
         0         1        15         0        17        56         
21         0
[0.8,0.9)   [0.9,1]
         5         5

Notice the gap in the 0.4 category. This may be why the S/R designers  
chose to make the default for right=TRUE.

>
-- 

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list