[R] tapply bug? - levels of a factor in a data frame after tapply are intermixed

Marc Schwartz marc_schwartz at comcast.net
Fri Feb 13 18:24:40 CET 2009


on 02/13/2009 11:09 AM Dimitri Liakhovitski wrote:
> Hello! I have encountered a really weird problem. Maybe you've
> encountered it before?
> I have a large data frame "importances". It has one factor ($A) with 3
> levels: 3, 9, and 15. $B is a regular numeric variable.
> Below I am picking a really small sub-frame (just 3 rows) based on
> "indices". "indices" were chosen so that all 3 levels of A are
> present:
> 
> indices=c(14329,14209,14353)
> test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][indices])
> Here is what the new data frame "test" looks like:
> 
>             yy        xx
> 1 -0.009984006  9
> 2 -2.339904131  3
> 3 -0.008427385 15
> 
> Here is the structure of "test":
>> str(test)
> 'data.frame':   3 obs. of  2 variables:
>  $ yy: num  -0.00998 -2.3399 -0.00843
>  $ xx: Factor w/ 3 levels "3","9","15": 2 1 3
> 
> Notice - the order of factor levels for xx is not 1 2 3 as it should
> be but 2 1 3. How come?
> 
> Or also look at this:
>> test$xx
> [1] 9  3  15
> Levels: 3 9 15
> 
> Same thing.
> Do you know what might be the reason?
> 
> Thank you very much!

The output of str() is showing you the factor levels of test$xx,
followed by the internal integer codes for the three actual values of
test$xx, 9, 3, and 15:

> str(test$xx)
 Factor w/ 3 levels "3","9","15": 2 1 3

> levels(test$xx)
[1] "3"  "9"  "15"

> as.integer(test$xx)
[1] 2 1 3

9 is the second level, hence the 2
3 is the first level, hence the 1
15 is the third level, hence the 3.

No problems, just clarification needed on what you are seeing.

Note that you do not reference anything above regarding tapply() as per
your subject line, though I suspect that I have an idea as to why you did...

HTH,

Marc Schwartz




More information about the R-help mailing list