[BioC] doing paired t-test amongst several groups

James W. MacDonald jmacdon at med.umich.edu
Fri Feb 16 16:19:45 CET 2007


Hi Milena,

Milena Gongora wrote:
> Hello James,
> 
> Thanks a lot for your insight! I made that contrast matrix as you 
> suggested and got the results fine.
> 
> Can I add another question?
> 
> You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm), the 
> baseline is OB1/SurfaceA and everything is compared to that...
> 
> So are the results under coefficients SurfaceB and SurfaceC taking into 
> consideration all patients? or just OB1? (I imagine is all patients, but 
> just checking how it works)

Yep. All patients. You can convince yourself of this by noting that the 
design matrix has all of these rows:

             Int  OB2  OB3  OB4  OB5 B  C
2            1    0    0    0    0  1  0
5            1    1    0    0    0  1  0
8            1    0    1    0    0  1  0
11           1    0    0    1    0  1  0
14           1    0    0    0    1  1  0

A simple way to think about this is to go back to first principles. The 
design matrix is used to do the linear algebra required to estimate 
these coefficients. Linear algebra is just a compact way to solve for 
multiple unknowns (remember in algebra how you can solve for x with one 
equation, but to solve for x and y you need two, and to solve for x, y, 
and z you need three, and on and on?).

So the above design matrix specifies the following equations:

<val1> = (OB1 + A) + (B - A)
<val2> = (OB1 + A) + (OB2 - OB1) + (B - A)
<val3> = (OB1 + A) + (OB3 - OB1) + (B - A)
.
.
.

Which we solve simultaneously to get the coefficients, based on all the 
patients.

Does that help?

Best,

Jim



> 
> Thanks again!
> 
> Milena
> 
> James W. MacDonald wrote:
> 
>> Hi Milena,
>>
>> Milena Gongora wrote:
>>  
>>
>>> Hello Everyone,
>>>
>>> I am wondering if anyone has scaled a paired t-test to do multiple 
>>> pairwise comparisons and can enlighten me in how to interpret the 
>>> outcome. I read the limma guide back and forth but seem to be missing 
>>> on understanding a few things.
>>>
>>> Essentially I am doing a paired t-test, but have 3 treatments and 
>>> wish to make pairwise comparisons of all combinations.
>>>
>>> I have single channel data (Illumina) that I imported using 
>>> BeadExplorer, this creates an exprSet. Following that I 
>>> RMA-bg-corrected and then normalized using Quantile normalization 
>>> from the BeadExplorer package, which essentially invokes limma 
>>> Quantile normalization. As a result of this I had an exprSet of 
>>> normalized values which I then log2 transformed.
>>>
>>> So my experimental design is as follows, 5 patients that were 
>>> biopsied (OB1 to OB5) and their biopsy split into 3 cultures of cells 
>>> that underwent each a different treatment (surfaces A, B, C). 
>>> Therefore I have 3 treatments, each with 5 replicates but they are of 
>>> the same origin, which to my logic seems like I should analyse as 
>>> paired samples.
>>>
>>> My challenge was to scale the paired t-test to 3 sets of comparisons.
>>>
>>> So first I read a targets file that specifies all the pairs and 
>>> treatments
>>>
>>>  > targets <- readTargets("samples.txt")
>>>  > targets
>>>        FileName Patient Surface
>>> 1  1519138023_A     OB1     A
>>> 2  1488802050_A     OB1     B
>>> 3  1488802050_D     OB1     C
>>> 4  1519138023_B     OB2     A
>>> 5  1488802050_B     OB2     B
>>> 6  1488802050_E     OB2     C
>>> 7  1519138023_C     OB3     A
>>> 8  1488802050_C     OB3     B
>>> 9  1488802050_F     OB3     C
>>> 10 1519138023_D     OB4     A
>>> 11 1519138023_E     OB4     B
>>> 12 1519138023_F     OB4     C
>>> 13 1519138034_A     OB5     A
>>> 14 1519138034_B     OB5     B
>>> 15 1519138034_C     OB5     C
>>>
>>> Then make the design matrix
>>>  > Patients <- factor(targets$Patient)
>>>  > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") )
>>>  > paired_design <- model.matrix(~Patients+Surfaces)
>>>
>>> And then fit a linear model and do eBayes
>>>  > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, paired_design)
>>>  > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm)
>>>
>>>  > topTable(fit2_paired_RMAbg_Qnorm, number=2)
>>>                  ID X.Intercept.   PatientsOB2 PatientsOB3 PatientsOB4
>>> 13720 GI_34304116-S     15.29244  1.431159e-15   0.1152188  0.14177094
>>> 11757 GI_31543813-S     15.14338 -1.090994e-01   0.1038085  0.08840763
>>>
>>>       PatientsOB5 SurfacesSLA SurfacesSLAa  AveExpr        F
>>> 13720 -0.03689951 0.006326441   0.01046853 15.34205 30967.96
>>> 11757  0.01106040 0.080210742  -0.06714165 15.16657 29657.53
>>>
>>>            P.Value    adj.P.Val
>>> 13720 1.549603e-24 1.816823e-20
>>> 11757 2.007728e-24 1.816823e-20
>>>  
>>>
>>> My Questions are:
>>> I am a bit confused by the fact that in the resulting table (shown by 
>>> topTable) I am getting a column for the intercept of surface A with 
>>> all patients as well as other surfaces. What do the values under 
>>> patients mean? Does the fact that they are being considered reduces 
>>> the power of the comparison to the other surfaces?
>>>     
>>
>>
>> For starters, you _have_ fit a paired design, and it is simple to get 
>> your results out. Unfortunately it is difficult to explain this via 
>> email (and if you were taking a linear modeling class there would 
>> probably be several lectures devoted to design matrices, so it isn't a 
>> trivial thing to learn).
>>
>> In short, the model you are fitting uses patient OB1/Surface A as a 
>> baseline, to which all other samples are compared (looking at the 
>> design matrix may help). The SurfacesB coefficient compares the B and 
>> A surfaces (B-A), and the SurfacesC coefficient compares the C and A 
>> surfaces (C-A). If you want the other comparison, you need to set up a 
>> contrasts matrix like this:
>>
>>  > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), 
>> "B-C"))
>>              B-C
>> (Intercept)   0
>> PatientsOB2   0
>> PatientsOB3   0
>> PatientsOB4   0
>> PatientsOB5   0
>> SurfacesB     1
>> SurfacesC    -1
>>
>> Because this will compute (B-A) - (C-A) = B-C.
>>
>> So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes 
>> different between A and B, coef=7 will give you the genes different 
>> between A and C, and fitting the contrast will give you the genes 
>> different between B and C.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>>  
>>
>>> As I am not interested in the differential expression amongst 
>>> patients, how do I avoid these being considered?
>>>
>>> How can I know about the differences amongst surfaces B and C?
>>>
>>> Do I need to or can I make a contrast matrix to specify which are the 
>>> comparisons I want to get information for? (only surfaces, and not 
>>> amongst patients)
>>>
>>> If I can make a contrast matrix, can you give me an example of how to 
>>> do it with 3 treatments?
>>>
>>> Many Thanks!
>>>
>>> Milena
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>     
>>
>>
>>
>>   
> 
> 
> 


-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623


**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.



More information about the Bioconductor mailing list