[R] Using nrow with summaryBy

David Winsemius dwinsemius at comcast.net
Wed Mar 17 17:21:20 CET 2010


On Mar 17, 2010, at 12:10 PM, Ivan Calandra wrote:

> Hi David,
>
> I have probably 2 stupid questions regarding what you said but it  
> might be important to understand:
>
> - why nrow() "would not make sens for a subsetted vector"?
> On the help page of nrow(), it's written that we can apply it on a  
> vector, array or dataframe (basically everything...?). So what's the  
> difference between a "normal" vector (for which it would make sense  
> and work) and a subsetted vector?

 > nrow(c(0,1,3,4))
NULL
 > nrow(1:12)
NULL

(It did not throw an error but the help page does say the value could  
be NULL.)

 > length(1:12)
[1] 12

>
> - why assuming "that length() applied to dataframes would tell me  
> how many rows it had" would be a mistake? I mean in this case,  
> length() is calculated for each numerical variable (which are  
> vectors, aren't they?).

length applied to any list is the number of elements at the first  
level. Dataframes are lists of vectors so length applied to  
data.frames gives you the number of columns, not the length of an  
individual vector in the dataframe.

>
> I think these questions concern the way R handle the data and that's  
> why I think it might be important for me to understand these issues.

It's important, fur sure.

>
> Thanks for your input.
> Regards,
> Ivan
>
> Le 3/17/2010 16:39, David Winsemius a écrit :
>>
>> On Mar 17, 2010, at 11:23 AM, Tony Laidig wrote:
>>
>>> Hello Everyone-
>>> I'm calculating summary statistics on a dataset (~4000 records,
>>> observations are not uniformly distributed) using summaryBy and  
>>> trying
>>> to add a column with the number of observations to the output as  
>>> well.
>>> What occurs to me is to use nrow(), but this doesn't appear to be  
>>> working
>>>
>>> I'm able to replicate the same results with an example from the
>>> summaryBy docs:
>>>
>>> data(dietox)
>>> dietox12<- subset(dietox,Time==12)
>>> library(doBy)
>>> #this one works
>>> summaryBy(Weight+Feed~Evit+Cu,data=dietox12,FUN=c(mean,var,length))
>>> #adding nrow doesn't give the number of rows
>>> summaryBy(Weight+Feed~Evit 
>>> +Cu,data=dietox12,FUN=c(mean,var,length,nrow))
>>>
>>
>> I'm a bit puzzled. One of my many newbie mistakes was to assume  
>> that length() applied to dataframes would tell me how many rows it  
>> had. It appears that the authors of summaryBy have figured out how  
>> to get length() to tell you the number of observations, presumably  
>> on a subsetted vector where length would make sense.  So ...  it's  
>> not clear why you also want nrow (which would not make sense for a  
>> subsetted vector).
>>
>>
>>>
>>> There must be a way to do this, but I can't figure it out. I suspect
>>> there is another function that would be compatible with summaryBy.
>>>
>>> Thanks in advance.
>>> -Tony
>>
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> -- 
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calandra at uni-hamburg.de
>
> **********
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list