[R] Summary using by() returns character arrays in a list

PIKAL Petr petr.pikal at precheza.cz
Wed Oct 10 15:43:12 CEST 2012


Hi

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Alex van der Spek
> Sent: Wednesday, October 10, 2012 2:48 PM
> To: r-help at r-project.org
> Subject: [R] Summary using by() returns character arrays in a list
> 
> I use by() to generate a summary statistics like so:
> 
> Lbys <- by(dat[Nidx], dat$LipTest, summary)
> 
> where Nidx is an index vector with names picking out the columns in the
> data frame dat.
> 
> This returns a list of character arrays (see below for str() output)
> where the columns are named correctly but the rownames are empty
> strings and the values are strings prepended with the summary
> statistic's name (e.g.
> "Min.", "Median ").

Without knowledge of your data it is difficult to understand what is wrong.

If I use iris data set as input everything goes as expected
data(iris)
> summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
> by(iris, iris$Species, summary)
iris$Species: setosa
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100  
 1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200  
 Median :5.000   Median :3.400   Median :1.500   Median :0.200  
 Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246  
 3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300  
 Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600  
       Species  
 setosa    :50  
 versicolor: 0  
 virginica : 0  

             
> 
> I am reading the code of summary.data.frame() but can't figure out how
> I can change the action of that function to return list of numeric
> matrices with as rownames the summary statistic's name ("Min.", "Max."
> etc) and as values the numeric values of the calculated summary
> statistic.

Just what do you not like on such output and how do you want the output structured?                
Maybe you want aggregate, but without simple data it is hard to say.

aggregate(iris[1:2], list(iris$Species), summary)

Regards
Petr

> 
> Any help much appreciated!
> Regards,
> Alex van der Spek
> 
> 
> > str(Lbys)
> List of 2
>  $    : 'table' chr [1:6, 1:19] "Min.   :-0.190  " "1st Qu.: 9.297  "
> "Median :10.373  " "Mean   :10.100  " ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr [1:6] "" "" "" "" ...
>   .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>  $ T38: 'table' chr [1:6, 1:19] "Min.   :8.648  " "1st Qu.:8.920  "
> "Median :9.018  " "Mean   :9.027  " ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr [1:6] "" "" "" "" ...
>   .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
> "Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
>  - attr(*, "dim")= int 2
>  - attr(*, "dimnames")=List of 1
>   ..$ dat$LipTest: chr [1:2] "" "T38"
>  - attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES =
> dat$LipTest, FUN = summary)
>  - attr(*, "class")= chr "by"
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list