[R] Is this an artifact of using "which"?

Richard.Cotton at hsl.gov.uk Richard.Cotton at hsl.gov.uk
Mon Apr 14 13:50:54 CEST 2008


> I used "which" to obtain a subset of values from my data.frame. 
> however, I find that there is a "trace" of the values I  have removed. 
> Any suggestions would be greatly appreciate.
> 
> Below is my data:
> 
> d <- data.frame( val   = 1:10,
>                  group = sample(LETTERS[1:5], 10, repl=TRUE) )
> 
>  >d
>     val group
> 1    1     B
> 2    2     E
> 3    3     B
> 4    4     C
> 5    5     A
> 6    6     B
> 7    7     A
> 8    8     E
> 9    9     E
> 10  10     A
> 
> ## selecting everything that is not group "A"
>   d<-d[which(d$group !="A"),]
> 
>  > d
>    val group
> 1   1     B
> 2   2     E
> 3   3     B
> 4   4     C
> 6   6     B
> 8   8     E
> 9   9     E
> 
>  > levels(d$group)
> [1] "A" "B" "C" "E"

The (imho) unintuitive behaviour is to do with the subsetting function 
[.factor, not which.  There are a couple of workarounds:

1. Call factor to recreate the levels, and get rid of "A"
factor(d$group)

2. Redefine [.factor; see dropUnusedLevels in the Hmisc package.

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}



More information about the R-help mailing list