[R] Multiple Paired T test from large Data Set with multiple pairs

arun smartpink111 at yahoo.com
Wed May 1 23:43:44 CEST 2013


Hi,
Assuming that your dataset is similar to the one below:
set.seed(25)
dat1<- data.frame(Algae.Mass=sample(40:50,10,replace=TRUE),Seagrass.Mass=sample(30:70,10,replace=TRUE),Terrestrial.Mass=sample(80:100,10,replace=TRUE),Other.Mass=sample(40:60,10,replace=TRUE),Site.X.Treatment=rep(c("ALA1A","ALA1U"),each=5),stringsAsFactors=FALSE)
library(reshape2)
dat2<-melt(dat1,id.var="Site.X.Treatment")

sapply(split(dat2,dat2$variable),function(x) t.test(x[x$Site.X.Treatment=="ALA1A",3],x[x$Site.X.Treatment=="ALA1U",3],paired=TRUE)$p.value)
  #    Algae.Mass    Seagrass.Mass Terrestrial.Mass       Other.Mass 
  #     1.0000000        0.4624989        0.4388211        0.7521036 
#or
library(plyr)
 ddply(dat2,.(variable),function(x) summarize(x,Pvalue=t.test(value~Site.X.Treatment,data=x,na.rm=TRUE,paired=TRUE)$p.value))
#          variable    Pvalue
#1       Algae.Mass 1.0000000
#2    Seagrass.Mass 0.4624989
#3 Terrestrial.Mass 0.4388211
#4       Other.Mass 0.7521036


A.K.


>Hey, 
>
>I have a fairly large data set with multiple pairs of Sites. 
 Each site has two levels (the pairs) "A" and "U".  For each pair I want
 to do a paired t test of >4 different metrics that exist as columns in 
my data set. 
>
>Here is the long version 
>
>t.test(Algae.Mass[Site.X.Treatment=="ALA1A"],Algae.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>t.test(Seagrass.Mass[Site.X.Treatment=="ALA1A"],Seagrass.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>t.test(Terrestrial.Mass[Site.X.Treatment=="ALA1A"],Terrestrial.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>t.test(Other.Mass[Site.X.Treatment=="ALA1A"],Other.Mass[Site.X.Treatment=="ALA1U"], paired=T) 
>
>How can I do this in one line of code?  I have tried lapply, 
tapply etc but keep running into issues.  It would also be great to not 
have to keep defining >"Site.X.Treatment".  I do have Site.X.Treatment 
broken down by just Site and Treatment in separate columns in the data 
set.  Any Ideas??



More information about the R-help mailing list