[R] tapply with cbinded x

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Jun 16 13:46:20 CEST 2009


Petr PIKAL wrote:
> Hi
>
> r-help-bounces at r-project.org napsal dne 16.06.2009 12:45:04:
>
>   
>> Stefan Uhmann wrote:
>>     
>>> Dear List,
>>>
>>> why does this not work?
>>>
>>> df <- data.frame(var1 = c(3,2,1), var2 = c(6,5,4), var3 = c(9,8,7),
>>>     fac = c('A', 'A', 'B'))
>>> tapply(cbind(df$var1, df$var2, df$var3), df$fac, mean)
>>>       
>> because
>>
>>     length(cbind(df$var1, df$var2, df$var3))
>>     # 9
>>     
>
> This is the problem with cbinding anything. You will get matrix which is 
> basically a vector with dimensions
>
>   
>> dim(cbind(df$var1, df$var2, df$var3))
>>     
> [1] 3 3
>
>   

sort of, but not exactly.  for example,

    d = data.frame(1:2)
    is.matrix(cbind(d, d))
    # FALSE

can't say if it follows the docs, because the docs avoid stating
precisely what the result from cbind on data frames is.  (see ?cbind,
sections Value and Data frame methods.)

in the particular example above, the input to cbind is *not* data
frames, but atomic vectors, hence the output is not a data frame, but in
general, it does not need to be a matrix.


>   
>>     length(df$fac)
>>     # 3
>>
>> and that's enough for it not to work, as far as i understand what
>> ?tapply says.
>>
>> here's another question:  why this *does* work (or "work"):
>>
>>     d = data.frame(a=1:3, b=1:3, c=1:3)
>>     f = factor(1:3)
>>
>>     tapply(d, f, c)
>>     # no issues
>>     
>
> Tapply does not check the arguments, it just check if d and f has the same 
> length, which in your case is true.
>   

possibly, but if the docs say that the input is an atomic object, i
expect the call to fail if the input isn't an atomic object.  either the
documentation or the implementation is wrong.


>   
>> length(d)==3
>>     
> [1] TRUE
>   
>
> and the aggregation function itself is 
>
> lapply(split(d, f), c)
>
> Which is in your case also valid although not very meaningful.
>
> Maybe the help page shall state that with data.frame as first argument you 
> can get unexpected results. 

this sort of comment would be at home in a large part of r docs.  but,
again, the original problem was not due to tapply receiving a data frame
as input.


> However I would expect that help page shows 
> you recommended way of doing things not to warn you about any possible 
> combination of actions.
>   

i would expect it to clearly document the implemented behaviour, to the
extent possible.  if tapply is implemented to play with data frames, why
inform the user that the input is (= should be?) an atomic object?  or,
if tapplying over data frames may lead to unexpected results and would
remain undocumented, why have tapply accept data frames in the first place?

vQ




More information about the R-help mailing list