[R] multiple t-tests across similar variable names

Thu Oct 11 16:52:17 CEST 2012

Hello,

Em 11-10-2012 15:14, arun escreveu:
> HI Rui,
>
> By running your code, I got the results as:
> result
> #       MeanDiff   CIlower    CIupper      p.value
> #apple     -12.6 -16.68052  -8.519476 0.0010166626
> #banana    -15.0 -17.91196 -12.088040 0.0001388506
> #orange    -18.2 -22.79583 -13.604166 0.0003888560
>
>  From my code:
> res3
> #       meandifference     CIlow   CIhigh      p.value
> #apple            12.6  8.519476 16.68052 0.0010166626
> #banana           15.0 12.088040 17.91196 0.0001388506
> #orange           18.2 13.604166 22.79583 0.0003888560
>
> There is difference in signs.

Mistery solved.

Rui Barradas
> A.K.
>
>
>
>
> ----- Original Message -----
> From: Rui Barradas <ruipbarradas at sapo.pt>
> To: arun <smartpink111 at yahoo.com>; "Nundy, Shantanu" <snundy at chicagobooth.edu>
> Cc: R help <r-help at r-project.org>
> Sent: Thursday, October 11, 2012 9:25 AM
> Subject: Re: [R] multiple t-tests across similar variable names
>
> Hello,
>
> I have a problem, with your data example my results are different. I have changed the names of two of the variables, to allow for 'pre' and 'post' to be first in the names.
>
> # auxiliary functions
> ifswap <- function(x)
>      if(x[1] %in% c("pre", "post")) x[2:1] else x
>
> getpair <- function(i, post)
>      post[ which(vmat[post, 1] == vmat[i, 1]) ]
>
> makeLine <- function(h)
>      c(MeanDiff = unname(h$estimate),
>          CIlower = h$conf.int[1],
>          CIupper = h$conf.int[2],
>          p.value = h$p.value)
>
> doTests <- function(DF, Pairs){
>      t.list <- lapply( seq_len(nrow(Pairs)), function(i)
>          t.test(DF[, Pairs[i, 1]], DF[, Pairs[i, 2]], paired = TRUE) )
>      do.call(rbind, lapply(t.list, makeLine))
> }
>
> # dataset
> set.seed(432)
> dat2 <- data.frame(apple_pre = sample(10:20,5,replace=TRUE),
>              orange_post = sample(18:28,5,replace=TRUE),
>              pre_banana = sample(25:35,5,replace=TRUE),  # here
>              apple_post = sample(20:30,5,replace=TRUE),
>              post_banana = sample(40:50,5,replace=TRUE), # and here
>              orange_pre = sample(5:10,5,replace=TRUE))
>
>
> #--------------------------------
> # start processing the data.frame
> # Make pairs of pre/post columns
> vars <- names(dat2)
> vmat <- do.call(rbind, strsplit(vars, "_"))
> vmat <- t(apply(vmat, 1, ifswap))
> pre <- which(vmat[, 2] == "pre")
> post <- which(vmat[, 2] == "post")
> post <- sapply(pre, getpair, post)
> pairs <- matrix(c(pre, post), ncol = 2)
>
> # now the tests
> result <- doTests(dat2, pairs)
> rownames(result) <- vmat[pre, 1]
> result
>
>
> In your results I believe that the values for meandifference are the means of x[, 1], at least that's what I've got.
> Anyway, I'll see both codes again, to try to see what's going on.
>
> Hope this helps,
>
> Rui Barradas
>
> Em 11-10-2012 05:31, arun escreveu:
>> HI,
>>
>> If you have a lot of variables and in no order, then it would be better to order the data by column names.
>> For e.g.
>> set.seed(432)
>> dat2<-data.frame(apple_pre=sample(10:20,5,replace=TRUE),orange_post=sample(18:28,5,replace=TRUE),banana_pre=sample(25:35,5,replace=TRUE),apple_post=sample(20:30,5,replace=TRUE),banana_post=sample(40:50,5,replace=TRUE),orange_pre=sample(5:10,5,replace=TRUE))
>> dat3<-dat2[order(colnames(dat2))] #order the columns
>> list3<-list(dat3[,1:2],dat3[,3:4],dat3[,5:6])
>> res3<-do.call(rbind,lapply(lapply(list3,function(x) t.test(x[,1],x[,2],paired=TRUE)),function(x) data.frame(meandifference=x$estimate,CIlow=unlist(x$conf.int)[1],CIhigh=unlist(x$conf.int)[2],p.value=x$p.value)))
>> row.names(res3)<-unlist(unique(lapply(strsplit(colnames(dat3),"_"),`[`,1)))
>> res3
>> #     meandifference     CIlow   CIhigh      p.value
>> #apple            12.6  8.519476 16.68052 0.0010166626
>> #banana           15.0 12.088040 17.91196 0.0001388506
>> #orange           18.2 13.604166 22.79583 0.0003888560
>>
>> A.K.
>>
>>
>>
>> ----- Original Message -----
>> From: "Nundy, Shantanu" <snundy at chicagobooth.edu>
>> To: "r-help at r-project.org" <r-help at r-project.org>
>> Cc:
>> Sent: Wednesday, October 10, 2012 7:09 PM
>> Subject: Re: [R] multiple t-tests across similar variable names
>>
>> Hi everyone-
>>
>> I have a dataset with multiple "pre" and "post" variables I want to compare. The variables are named "apple_pre" or "pre_banana" with the corresponding post variables named "apple_post" or "post_banana". The variables are in no particular order.
>>
>> apple_pre orange_pre orange_post pre_banana apple_post post_banana
>> person_1
>> person_2
>> person_3
>> ...
>> person_x
>>
>>
>> How do I:
>> 1. Run a series of paired t-tests for the apple_pre variables and pre_banana variables? Would be great to do something like ttest(*.*pre*.*,*.*post*.*).
>> 2. Print the results from these t-tests in a table with col 1=mean difference, col 2= 95% conf interval, col 3=p-value.
>>
>> Thank you kindly,
>> -Shantanu
>>
>> Shantanu Nundy, M.D.
>> University of Chicago
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.