[R] Summary

Tue Sep 29 19:32:07 CEST 2009

A alternative in cases where is there NA's shoul be:

sapply(sapply(df, summary), '[', 1:7)

On Tue, Sep 29, 2009 at 2:28 PM, William Dunlap <wdunlap at tibco.com> wrote:
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Henrique
>> Dallazuanna
>> Sent: Tuesday, September 29, 2009 9:57 AM
>> To: Ashta
>> Cc: R help
>> Subject: Re: [R] Summary
>>
>> Try this:
>>
>> sapply(xc, summary)
>
> This fails if there are NA's in xc.  Try I think you need a custom summary-like
> function for this.  E.g.,
>
>> df<-data.frame(X1=sqrt(-1:5), X2=-1:5, X3=log(-1:5))
>> sapply(df,quantile,na.rm=TRUE)
>           X1   X2        X3
> 0%   0.000000 -1.0      -Inf
> 25%  1.103553  0.5 0.1732868
> 50%  1.573132  2.0 0.8958797
> 75%  1.933013  3.5 1.3143738
> 100% 2.236068  5.0 1.6094379
>> sapply(df,function(x)c(quantile(x,na.rm=TRUE), "NA's"=sum(is.na(x))))
>           X1   X2        X3
> 0%   0.000000 -1.0      -Inf
> 25%  1.103553  0.5 0.1732868
> 50%  1.573132  2.0 0.8958797
> 75%  1.933013  3.5 1.3143738
> 100% 2.236068  5.0 1.6094379
> NA's 1.000000  0.0 1.0000000
>
> The standard summary function for data.frames doesn't do this
> because each type of column may have a different sort of summary
> (quartiles + mean + possible NA count for numeric columns,
> tables for factor columns, etc.). The above only works for
> all-numeric data.frames.
>
> Bill Dunlap
> TIBCO Software Inc - Spotfire Division
> wdunlap tibco.com
>
>>
>> On Tue, Sep 29, 2009 at 12:42 PM, Ashta <sewashm at gmail.com> wrote:
>> > My data is called  xc and has more than 15 variables.
>> >
>> >
>> > When I used summary(xc)   it gave me the detail description of each
>> > variable.
>> >
>> >
>> >
>> > Summary(xc)
>> >
>> >
>> >
>> >              Y1                x1                      x2
>> >    x3 ..
>> >
>> >  Min.     :0.0000   Min.   : 1.000   Min.   : 1.000   Min.   : 1.000
>> >
>> >  1st Qu. :0.0000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 2.000
>> >
>> >  Median :1.0000   Median : 1.000   Median : 1.000   Median : 3.000
>> >
>> >  Mean    :0.6505   Mean   : 2.816   Mean   : 3.542   Mean   : 3.433
>> >
>> >  3rd Qu. :1.0000   3rd Qu.: 4.000   3rd Qu.: 6.000   3rd Qu.: 5.000
>> >
>> >  Max.     :1.0000   Max.   :10.000   Max.   :10.000   Max.   :10.000
>> >
>> >
>> >
>> > But I want the output in the following way.
>> >
>> >
>> >
>> >               Y1            x1         x2        x3 ..
>> >
>> >  Min.     :0.0000    1.000    1.000    1.000
>> >
>> >  1st Qu. :0.0000    1.000    1.000    2.000
>> >
>> >  Median :1.0000   1.000    1.000    3.000
>> >
>> >  Mean    :0.6505   2.816    3.542    3.433
>> >
>> >  3rd Qu. :1.0000   4.000     6.000   5.000
>> >
>> >  Max.     :1.0000   10.000  10.000  :10.000
>> >
>> >
>> > Is it possible to do it in R?
>> >
>> >
>> > Thanks in advance
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Henrique Dallazuanna
>> Curitiba-Paraná-Brasil
>> 25° 25' 40" S 49° 16' 22" O
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O