[R] Summary Statistics for data.frame

Duncan Murdoch murdoch at stats.uwo.ca
Sat Jul 8 23:19:12 CEST 2006


On 7/8/2006 4:55 PM, justin rapp wrote:
> When I attempt to use the mysummary function, I obtain the following error:
> 
> Error in var(x) : missing observations in cov/cor

var() gives that error if it sees NA values.  You can get it to remove 
them by using

var(x, na.rm = TRUE)

instead of var(x).  Whether that makes sense depends on the context of 
your problem.

Duncan Murdoch

> 
> When I use:
> by(data.logistic,data.logistic$Ydrafted,summary)
> 
> I receive no errors. I cut and pasted your mysummary function directly
> into my r console.  Should I have made any adjustments to the code?
> 
> jdr
> 
> On 7/8/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 7/8/2006 3:44 PM, justin rapp wrote:
>>> I apologize for my constant questions but I am new to R and trying to
>>> gain an appreciation for its capabilities.  The following task is easy
>>> in Excel and I was hoping somebody could give me a quick explanation
>>> for how it can be acheived in R so I can avoid having to switch
>>> between the two applications.
>>>
>>> How do I find the Summary Statistics in one Vector of the dataframe by
>>> levels in another of the vectors.
>>>
>>> For example, I have the following headings for my data.frame.
>>> Conference
>>> Year Drafted
>>> Height
>>> Weight
>>> Ratio
>>>
>>> I would like to see compute the mean Height, Weight, and Ratio as well
>>> as their variances for each of the years under Year
>>> Drafted(1980-2000).  What is the most efficient way of doing this?
>> I think the quickest is
>>
>> by(mydf, mydf$Year, summary)
>>
>> but this won't give you the variance.  You'll need your own little
>> function to calculate mean and variance, e.g.
>>
>> mysummary <- function(df) apply(df, 2,
>>                 function(x) c(mean=mean(x), variance=var(x)))
>>
>> by(mydf, mydf$Year, mysummary)
>>
>> If you don't like the format of the output, you can play around with the
>> mysummary function.  It will be applied to each subset of the
>> data.frame, and the results will be put together into a list with one
>> entry per level of mydf$Year.
>>
>>
>> Duncan
>>



More information about the R-help mailing list