[R] sd() of column, but for a subset of rows

Shaw, Stephanie sshaw at epri.com
Thu Sep 25 22:44:29 CEST 2008


 
I would like to take the standard deviation of a column, but only for a
subset of the rows in that column with a given index. The following loop
worked fine when I wanted the mean, but is not working for the standard
deviation:
 
for (i in 1:length(x[1,])){
 a<-tapply(x[,i],x[,2],sd, na.rm=TRUE)
 xnew<-cbind(xnew,a)}
 
I have tried re-defining the sd as follows (as suggested on this board),
and using that in the code above in place of 'sd', but it doesn't work
because soem of my columns have dates or character strings. Sort of
redefining my matrix to remove those columns, is there a simple way to
fix this and have those columns report NAs?
 
mysd<-function(x,na.rm=FALSE)
{
 if (is.matrix(x))
   apply(x,2,mysd,na.rm=na.rm)
 else if (is.vector(x)){
   if(na.rm) x<-x[!is.na(x)]
   if(length(x)==0) return(NA)
   sqrt(var(x,na.rm=na.rm))
 }
 else if (is.data.frame(x))
   sapply(x,mysd,na.rm=na.rm)
 else {
   x<-as.vector(x)
   mysd(x,na.rm=na.rm)
 }
}
 

Thank you,
Stephanie

-------------------------------------------------------------   

Stephanie L. Shaw, Ph.D.
Project Manager, Air Quality, Environment Sector
Electric Power Research Institute (EPRI)
3420 Hillview Ave, Palo Alto, CA 94304 USA
Phone: 650.855.2353 * Mobile: 650.391.8203 * Fax: 650.855.1069
Email: sshaw at epri.com <mailto:sshaw at sshaw@epri.com>  
  



More information about the R-help mailing list