[R] tapply bug? - levels of a factor in a data frame after tapply are intermixed

Dimitri Liakhovitski ld7631 at gmail.com
Fri Feb 13 18:54:02 CET 2009


Sorry - one clarification:
When I run:
> test$xx - the what I am currently seeing is:
 [1] 9  3  15
 Levels: 3 9 15
But what I am expecting to be seeing is:
 [1] 9  3  15
 Levels: 9 3 15
Or maybe: Levels: 2 1 3


On Fri, Feb 13, 2009 at 12:38 PM, Dimitri Liakhovitski <ld7631 at gmail.com> wrote:
> On Fri, Feb 13, 2009 at 12:24 PM, Marc Schwartz
> <marc_schwartz at comcast.net> wrote:
>> on 02/13/2009 11:09 AM Dimitri Liakhovitski wrote:
>>> Hello! I have encountered a really weird problem. Maybe you've
>>> encountered it before?
>>> I have a large data frame "importances". It has one factor ($A) with 3
>>> levels: 3, 9, and 15. $B is a regular numeric variable.
>>> Below I am picking a really small sub-frame (just 3 rows) based on
>>> "indices". "indices" were chosen so that all 3 levels of A are
>>> present:
>>>
>>> indices=c(14329,14209,14353)
>>> test=data.frame(yy=importances[["B']][indices],xx=importances[["A"]][indices])
>>> Here is what the new data frame "test" looks like:
>>>
>>>             yy        xx
>>> 1 -0.009984006  9
>>> 2 -2.339904131  3
>>> 3 -0.008427385 15
>>>
>>> Here is the structure of "test":
>>>> str(test)
>>> 'data.frame':   3 obs. of  2 variables:
>>>  $ yy: num  -0.00998 -2.3399 -0.00843
>>>  $ xx: Factor w/ 3 levels "3","9","15": 2 1 3
>>>
>>> Notice - the order of factor levels for xx is not 1 2 3 as it should
>>> be but 2 1 3. How come?
>>>
>>> Or also look at this:
>>>> test$xx
>>> [1] 9  3  15
>>> Levels: 3 9 15
>>>
>>> Same thing.
>>> Do you know what might be the reason?
>>>
>>> Thank you very much!
>>
>> The output of str() is showing you the factor levels of test$xx,
>> followed by the internal integer codes for the three actual values of
>> test$xx, 9, 3, and 15:
>>
>>> str(test$xx)
>>  Factor w/ 3 levels "3","9","15": 2 1 3
>>
>>> levels(test$xx)
>> [1] "3"  "9"  "15"
>>
>>> as.integer(test$xx)
>> [1] 2 1 3
>>
>> 9 is the second level, hence the 2
>> 3 is the first level, hence the 1
>> 15 is the third level, hence the 3.
>>
>> No problems, just clarification needed on what you are seeing.
>>
>> Note that you do not reference anything above regarding tapply() as per
>> your subject line, though I suspect that I have an idea as to why you did...
>>
>> HTH,
>>
>> Marc Schwartz
>>
>>
>
> Marc (and everyone), I expected it to show:
> $ xx: Factor w/ 3 levels "3","9","15":  1 2 3
> rather than what I am seeing:
> $ xx: Factor w/ 3 levels "3","9","15":  2 1 3
> Because 3 is level 1, 9 is level 2 and 15 is level 3.
> I have several other factors in my original data frame. And I've done
> that tapply for all of them (for the same dependent variable) - and in
> all of them the first level was 1, the second 2, etc.
> Why I am concerned about the problem? Because I am plotting the means
> of the numeric variable against the levels of the factor and it's
> important to me that the factor levels are correct (in the right
> order)...
> Dimitri
>
>
> --
> Dimitri Liakhovitski
> MarketTools, Inc.
> Dimitri.Liakhovitski at markettools.com
>



-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com




More information about the R-help mailing list