[R] R for simple stats

Frank E Harrell Jr fharrell at virginia.edu
Fri Jun 28 20:57:45 CEST 2002


You might also take a look at some functions in the Hmisc library, e.g.:

set.seed(1)
x <- runif(1000)
g <- factor(sample(letters[1:4],1000,T))
describe(x)

x 
      n missing  unique    Mean     .05     .10     .25     .50     .75     .90 
   1000       0    1000  0.5043 0.06128 0.11650 0.26521 0.50441 0.74055 0.90252 
    .95 
0.95984 

lowest : 0.003536 0.004208 0.004228 0.006153 0.006443
highest: 0.998321 0.998607 0.998766 0.999014 0.999439 

options(digits=3)
s <- function(y) c(Mean=mean(y),Median=median(y),SD=sqrt(var(y)))
summary(x ~ g, fun=s)

x    N=1000

+-------+-+----+-----+------+-----+
|       | |N   |Mean |Median|SD   |
+-------+-+----+-----+------+-----+
|g      |a| 254|0.495|0.469 |0.283|
|       |b| 243|0.523|0.533 |0.294|
|       |c| 249|0.495|0.481 |0.278|
|       |d| 254|0.505|0.514 |0.289|
+-------+-+----+-----+------+-----+
|Overall| |1000|0.504|0.504 |0.286|
+-------+-+----+-----+------+-----+

summarize(x, g, s)         # to cross-classify g -> llist(g1,g2)

  g     x Median    SD     # x column=Mean
1 a 0.495  0.469 0.283
2 b 0.523  0.533 0.294
3 c 0.495  0.481 0.278
4 d 0.505  0.514 0.289

Frank Harrell

On Fri, 28 Jun 2002 11:21:32 -0700
Brett Magill <bmagill at earthlink.net> wrote:

> The code attached creates a function for descriptives statistics called
> dstats.  Enter the name of the column you want to summarize and dstats will
> produce a nice summary.  If you have a data frame of numeric variables and
> want to summarize by column, you can use something like:
> 
> apply(data.frame.name,2,dstats)
> 
> wrap t( ) around the above to get the output in a format that I find more
> useable.
> 
> Brett
> 
> 
> 
> dstats<-function(x,na.rm=T,digits=3) {
> 
>  dstats<-NULL
> 
>    dstats[1]<-mean(x,na.rm=na.rm)
>    dstats[2]<-sd(x,na.rm=na.rm)
>    dstats[3]<-var(x,na.rm=na.rm)
>    dstats[4]<-min(x,na.rm=na.rm)
>    dstats[5]<-max(x,na.rm=na.rm)
>    dstats[6]<-length(unique(x))
>    dstats[7]<-sum(!is.na(x))
>    dstats[8]<-sum(is.na(x))
>  
>    dstats<-round(dstats,digits=digits)
>    names(dstats)<-c("mean","sd","variance","min","max","unique","n","miss")
> 
>  return(dstats)
> }
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._


-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list