[R] Summary statistics for matrix columns

arun smartpink111 at yahoo.com
Sat Nov 24 18:11:46 CET 2012


Hi, 

You are right.  Range is supposed to be one value (i.e the 
difference between largest and smallest).  For some reason, the function
 range(x) gives both the values. 
The description for ?range() is: 
"Description: 

     ‘range’ returns a vector containing the minimum and maximum of all 
     the given arguments. 
" 
I looked for similar function in library(matrixStats) .  There it was colRanges(), rowRanges(). 
 set.seed(125) 
 x <- matrix(sample(1:80),nrow=8) 
 colnames(x)<- paste("Col",1:ncol(x),sep="")   
apply(x,2,function(x) range(x)) 
#     Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 
#[1,]   10    1   17    3   18   11   13   15    2     6 
#[2,]   74   77   76   70   65   63   79   80   71    72 
library(matrixStats) 
colRanges(x) 
   #   [,1] [,2] 
 #[1,]   10   74 
 #[2,]    1   77 
 #[3,]   17   76 
 ----------------- 
You could do this to get the range: 
 apply(x,2,function(x) diff(range(x))) 
 #Col1  Col2  Col3  Col4  Col5  Col6  Col7  Col8  Col9 Col10 
  # 64    76    59    67    47    52    66    65    69    66 
#or i 
 diff(t(colRanges(x))) 
#     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] 
#[1,]   64   76   59   67   47   52   66   65   69    66 
#or 
rowDiffs(colRanges(x)) 
A.K. 



----- Original Message -----
From: frespider <frespider at hotmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, November 24, 2012 7:58 AM
Subject: Re: [R] Summary statistics for matrix columns



HI A.k,

I need one more question, if you can answer it please

M <- matrix(sample(1:8000),nrow=100)
colnames(M)<- paste("Col",1:ncol(M),sep="")
apply(M,2,function(x) c(Min=min(x),"1st Qu" =quantile(x, 0.25,names=FALSE),
                        Range = range(x),
                        Median = quantile(x, 0.5, names=FALSE),
                        Mean= mean(x),Std=sd(x),
                        "3rd Qu" = quantile(x,0.75,names=FALSE),
                        IQR=IQR(x),Max = max(x)))

why I get two range . isn't range mean the different between the max and min


Thanks 
Date: Fri, 23 Nov 2012 16:08:12 -0800
From: ml-node+s789695n4650613h54 at n4.nabble.com
To: frespider at hotmail.com
Subject: Re: Summary statistics for matrix columns



    Hi,

No problem.


There are a couple of other libraries which deal with summary statistics:

library(pastecs)

?stat.desc() # 


library(matrixStats) 

#Using the functions from package: matrixStats

fun1<-function(x){

res<-rbind(colMins(x),colQuantiles(x)[,2],colMedians(x),colMeans(x),colSds(x),colQuantiles(x)[,4],colIQRs(x),colMaxs(x))

row.names(res)<-c("Min.","1st Qu.","Median","Mean","sd","3rd Qu.","IQR","Max.")

res}


set.seed(125)

x <- matrix(sample(1:80),nrow=8)

colnames(x)<- paste("Col",1:ncol(x),sep="")  

fun1(x)

#            Col1     Col2     Col3     Col4     Col5     Col6     Col7     Col8

#Min.    10.00000  1.00000 17.00000  3.00000 18.00000 11.00000 13.00000 15.00000

#1st Qu. 24.75000 29.50000 26.00000  7.75000 40.00000 17.25000 27.50000 34.75000

#Median  34.00000 46.00000 42.50000 35.50000 49.50000 23.50000 51.50000 51.50000

#Mean    42.50000 42.75000 41.75000 35.75000 44.87500 26.87500 44.75000 50.12500

#sd      25.05993 27.77846 19.57221 28.40397 16.39196 16.60841 21.97239 25.51995

#3rd Qu. 67.75000 58.50000 50.00000 63.25000 54.25000 30.25000 56.25000 70.50000

#IQR     43.00000 29.00000 24.00000 55.50000 14.25000 13.00000 28.75000 35.75000

#Max.    74.00000 77.00000 76.00000 70.00000 65.00000 63.00000 79.00000 80.00000

#           Col9    Col10

#Min.     2.00000  6.00000

#1st Qu. 24.50000 12.50000

#Median  33.50000 48.00000

#Mean    34.87500 40.75000

#sd      24.39811 28.21727

#3rd Qu. 45.25000 63.00000

#IQR     20.75000 50.50000

#Max.    71.00000 72.00000


I thought this could be faster than the previous methods.  But, it was the slowest.


set.seed(125)

x1 <- matrix(sample(1:800000),nrow=1000)

colnames(x)<- paste("Col",1:ncol(x1),sep="")


system.time(fun1(x1))

#   user  system elapsed 

# 0.968   0.000   0.956 

A.K.









________________________________

From: Fares Said <[hidden email]>

To: arun <[hidden email]> 

Cc: Pete Brecknock <[hidden email]>; R help <[hidden email]> 

Sent: Friday, November 23, 2012 10:23 AM

Subject: Re: [R] Summary statistics for matrix columns


Thank you all 


Sent from my iPhone


On 2012-11-23, at 10:19, "arun" <[hidden email]> wrote:


> HI,

> You are right.

> It is slower when compared to Pete's solution:

> set.seed(125)

> x <- matrix(sample(1:800000),nrow=1000)

> colnames(x)<- paste("Col",1:ncol(x),sep="")

> 

> system.time({

> res<-sapply(data.frame(x),function(x) c(summary(x),sd=sd(x),IQR=IQR(x)))

>  res1<-as.matrix(res) 

> res2<-res1[c(1:4,7,5,8,6),] })

> # user  system elapsed 

> #  0.596   0.000   0.597 

> 

> system.time({

> res<-apply(x,2,function(x) c(Min=min(x),

>                         "1st Qu" =quantile(x, 0.25,names=FALSE),

>                         Median = quantile(x, 0.5, names=FALSE),

>                         Mean= mean(x),

>                         Sd=sd(x),

>                         "3rd Qu" = quantile(x,0.75,names=FALSE),

>                         IQR=IQR(x),

>                         Max = max(x))) })

> # user  system elapsed 

>  # 0.384   0.000   0.384 

> 

> 

> A.K.

> 

> 

> 

> ----- Original Message -----

> From: Pete Brecknock <[hidden email]>

> To: [hidden email]

> Cc: 

> Sent: Friday, November 23, 2012 8:42 AM

> Subject: Re: [R] Summary statistics for matrix columns

> 

> frespider wrote

>> Hi,

>> 

>> it is possible. but don't you think it will slow the code if you convert

>> to data.frame?

>> 

>> Thanks 

>> 

>> Date: Thu, 22 Nov 2012 18:31:35 -0800

>> From:

> 

>> ml-node+s789695n4650500h51 at .nabble

> 

>> To:

> 

>> frespider@

> 

>> Subject: RE: Summary statistics for matrix columns

>> 

>> 

>> 

>>     HI,

>> 

>> Is it possible to use as.matrix()?

>> 

>> res<-sapply(data.frame(x),function(x) c(summary(x),sd=sd(x),IQR=IQR(x)))

>> 

>>   res1<-as.matrix(res)

>> 

>>   is.matrix(res1)

>> 

>> #[1] TRUE

>> 

>> res1[c(1:4,7,5,8,6),]

>> 

>> #            Col1     Col2     Col3     Col4     Col5     Col6     Col7    

>> Col8

>> 

>> #Min.    10.00000  1.00000 17.00000  3.00000 18.00000 11.00000 13.00000

>> 15.00000

>> 

>> #1st Qu. 24.75000 29.50000 26.00000  7.75000 40.00000 17.25000 27.50000

>> 34.75000

>> 

>> #Median  34.00000 46.00000 42.50000 35.50000 49.50000 23.50000 51.50000

>> 51.50000

>> 

>> #Mean    42.50000 42.75000 41.75000 35.75000 44.88000 26.88000 44.75000

>> 50.12000

>> 

>> #sd      25.05993 27.77846 19.57221 28.40397 16.39196 16.60841 21.97239

>> 25.51995

>> 

>> #3rd Qu. 67.75000 58.50000 50.00000 63.25000 54.25000 30.25000 56.25000

>> 70.50000

>> 

>> #IQR     43.00000 29.00000 24.00000 55.50000 14.25000 13.00000 28.75000

>> 35.75000

>> 

>> #Max.    74.00000 77.00000 76.00000 70.00000 65.00000 63.00000 79.00000

>> 80.00000

>> 

>>    #          Col9    Col10

>> 

>> #Min.     2.00000  6.00000

>> 

>> #1st Qu. 24.50000 12.50000

>> 

>> #Median  33.50000 48.00000

>> 

>> #Mean    34.88000 40.75000

>> 

>> #sd      24.39811 28.21727

>> 

>> #3rd Qu. 45.25000 63.00000

>> 

>> #IQR     20.75000 50.50000

>> 

>> #Max.    71.00000 72.00000

>> 

[[elided Hotmail spam]]

>> 

>> A.K.

>> 

>> 

>> 

>> 

>> 

>>    

>>    

>>    

>>    

>> 

>>    

>> 

>>    

>>    

>>         If you reply to this email, your message will be added to the discussion

>> below:

>>    

>> http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-tp4650489p4650500.html
>>    

>>    

>>        

>>         To unsubscribe from Summary statistics for matrix columns, click here.

>> 

>>         NAML

> 

> Then maybe ....

> 

> x <- matrix(sample(1:8000),nrow=100) 

> colnames(x)<- paste("Col",1:ncol(x),sep="") 

> 

> apply(x,2,function(x) c(Min=min(x), 

>                         "1st Qu" =quantile(x, 0.25,names=FALSE), 

>                         Median = quantile(x, 0.5, names=FALSE),

>                         Mean= mean(x),

>                         Sd=sd(x), 

>                         "3rd Qu" = quantile(x,0.75,names=FALSE),

>                         IQR=IQR(x),

>                         Max = max(x)))

> 

> HTH

> 

> Pete

> 

> 

> 

> --

> View this message in context: http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-tp4650489p4650547.html
> Sent from the R help mailing list archive at Nabble.com.

> 

> ______________________________________________

> [hidden email] mailing list

> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

>

______________________________________________

[hidden email] mailing list

https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



    
    
    
    

    

    
    
        If you reply to this email, your message will be added to the discussion below:
        http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-tp4650489p4650613.html
    
    
        
        To unsubscribe from Summary statistics for matrix columns, click here.

        NAML
                               



--
View this message in context: http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-tp4650489p4650643.html
Sent from the R help mailing list archive at Nabble.com.
    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list