[BioC] doing paired t-test amongst several groups

Milena Gongora m.gongora at imb.uq.edu.au
Sat Feb 17 14:14:54 CET 2007


Hi James,
Great, I am convinced. Thanks for the explanation, although my brain 
hurts following it....
Cheers,
Milena

James W. MacDonald wrote:
> Hi Milena,
>
> Milena Gongora wrote:
>> Hello James,
>>
>> Thanks a lot for your insight! I made that contrast matrix as you 
>> suggested and got the results fine.
>>
>> Can I add another question?
>>
>> You mention that in the result of topTable(fit2_paired_RMAbg_Qnorm), 
>> the baseline is OB1/SurfaceA and everything is compared to that...
>>
>> So are the results under coefficients SurfaceB and SurfaceC taking 
>> into consideration all patients? or just OB1? (I imagine is all 
>> patients, but just checking how it works)
>
> Yep. All patients. You can convince yourself of this by noting that 
> the design matrix has all of these rows:
>
>             Int  OB2  OB3  OB4  OB5 B  C
> 2            1    0    0    0    0  1  0
> 5            1    1    0    0    0  1  0
> 8            1    0    1    0    0  1  0
> 11           1    0    0    1    0  1  0
> 14           1    0    0    0    1  1  0
>
> A simple way to think about this is to go back to first principles. 
> The design matrix is used to do the linear algebra required to 
> estimate these coefficients. Linear algebra is just a compact way to 
> solve for multiple unknowns (remember in algebra how you can solve for 
> x with one equation, but to solve for x and y you need two, and to 
> solve for x, y, and z you need three, and on and on?).
>
> So the above design matrix specifies the following equations:
>
> <val1> = (OB1 + A) + (B - A)
> <val2> = (OB1 + A) + (OB2 - OB1) + (B - A)
> <val3> = (OB1 + A) + (OB3 - OB1) + (B - A)
> .
> .
> .
>
> Which we solve simultaneously to get the coefficients, based on all 
> the patients.
>
> Does that help?
>
> Best,
>
> Jim
>
>
>
>>
>> Thanks again!
>>
>> Milena
>>
>> James W. MacDonald wrote:
>>
>>> Hi Milena,
>>>
>>> Milena Gongora wrote:
>>>  
>>>
>>>> Hello Everyone,
>>>>
>>>> I am wondering if anyone has scaled a paired t-test to do multiple 
>>>> pairwise comparisons and can enlighten me in how to interpret the 
>>>> outcome. I read the limma guide back and forth but seem to be 
>>>> missing on understanding a few things.
>>>>
>>>> Essentially I am doing a paired t-test, but have 3 treatments and 
>>>> wish to make pairwise comparisons of all combinations.
>>>>
>>>> I have single channel data (Illumina) that I imported using 
>>>> BeadExplorer, this creates an exprSet. Following that I 
>>>> RMA-bg-corrected and then normalized using Quantile normalization 
>>>> from the BeadExplorer package, which essentially invokes limma 
>>>> Quantile normalization. As a result of this I had an exprSet of 
>>>> normalized values which I then log2 transformed.
>>>>
>>>> So my experimental design is as follows, 5 patients that were 
>>>> biopsied (OB1 to OB5) and their biopsy split into 3 cultures of 
>>>> cells that underwent each a different treatment (surfaces A, B, C). 
>>>> Therefore I have 3 treatments, each with 5 replicates but they are 
>>>> of the same origin, which to my logic seems like I should analyse 
>>>> as paired samples.
>>>>
>>>> My challenge was to scale the paired t-test to 3 sets of comparisons.
>>>>
>>>> So first I read a targets file that specifies all the pairs and 
>>>> treatments
>>>>
>>>>  > targets <- readTargets("samples.txt")
>>>>  > targets
>>>>        FileName Patient Surface
>>>> 1  1519138023_A     OB1     A
>>>> 2  1488802050_A     OB1     B
>>>> 3  1488802050_D     OB1     C
>>>> 4  1519138023_B     OB2     A
>>>> 5  1488802050_B     OB2     B
>>>> 6  1488802050_E     OB2     C
>>>> 7  1519138023_C     OB3     A
>>>> 8  1488802050_C     OB3     B
>>>> 9  1488802050_F     OB3     C
>>>> 10 1519138023_D     OB4     A
>>>> 11 1519138023_E     OB4     B
>>>> 12 1519138023_F     OB4     C
>>>> 13 1519138034_A     OB5     A
>>>> 14 1519138034_B     OB5     B
>>>> 15 1519138034_C     OB5     C
>>>>
>>>> Then make the design matrix
>>>>  > Patients <- factor(targets$Patient)
>>>>  > Surfaces <- factor(targets$Surface, levels=c("A", "B", "C") )
>>>>  > paired_design <- model.matrix(~Patients+Surfaces)
>>>>
>>>> And then fit a linear model and do eBayes
>>>>  > fit_paired_RMAbg_Qnorm <- lmFit(data_log2_RMAbg_Qnorm, 
>>>> paired_design)
>>>>  > fit2_paired_RMAbg_Qnorm <- eBayes(fit_paired_RMAbg_Qnorm)
>>>>
>>>>  > topTable(fit2_paired_RMAbg_Qnorm, number=2)
>>>>                  ID X.Intercept.   PatientsOB2 PatientsOB3 PatientsOB4
>>>> 13720 GI_34304116-S     15.29244  1.431159e-15   0.1152188  0.14177094
>>>> 11757 GI_31543813-S     15.14338 -1.090994e-01   0.1038085  0.08840763
>>>>
>>>>       PatientsOB5 SurfacesSLA SurfacesSLAa  AveExpr        F
>>>> 13720 -0.03689951 0.006326441   0.01046853 15.34205 30967.96
>>>> 11757  0.01106040 0.080210742  -0.06714165 15.16657 29657.53
>>>>
>>>>            P.Value    adj.P.Val
>>>> 13720 1.549603e-24 1.816823e-20
>>>> 11757 2.007728e-24 1.816823e-20
>>>>  
>>>>
>>>> My Questions are:
>>>> I am a bit confused by the fact that in the resulting table (shown 
>>>> by topTable) I am getting a column for the intercept of surface A 
>>>> with all patients as well as other surfaces. What do the values 
>>>> under patients mean? Does the fact that they are being considered 
>>>> reduces the power of the comparison to the other surfaces?
>>>>     
>>>
>>>
>>> For starters, you _have_ fit a paired design, and it is simple to 
>>> get your results out. Unfortunately it is difficult to explain this 
>>> via email (and if you were taking a linear modeling class there 
>>> would probably be several lectures devoted to design matrices, so it 
>>> isn't a trivial thing to learn).
>>>
>>> In short, the model you are fitting uses patient OB1/Surface A as a 
>>> baseline, to which all other samples are compared (looking at the 
>>> design matrix may help). The SurfacesB coefficient compares the B 
>>> and A surfaces (B-A), and the SurfacesC coefficient compares the C 
>>> and A surfaces (C-A). If you want the other comparison, you need to 
>>> set up a contrasts matrix like this:
>>>
>>>  > matrix(c(rep(0,5), 1, -1), dimnames=list(colnames(paired_design), 
>>> "B-C"))
>>>              B-C
>>> (Intercept)   0
>>> PatientsOB2   0
>>> PatientsOB3   0
>>> PatientsOB4   0
>>> PatientsOB5   0
>>> SurfacesB     1
>>> SurfacesC    -1
>>>
>>> Because this will compute (B-A) - (C-A) = B-C.
>>>
>>> So topTable(fit2_paired_RMAbg_Qnorm, coef=6) will give you the genes 
>>> different between A and B, coef=7 will give you the genes different 
>>> between A and C, and fitting the contrast will give you the genes 
>>> different between B and C.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>>  
>>>
>>>> As I am not interested in the differential expression amongst 
>>>> patients, how do I avoid these being considered?
>>>>
>>>> How can I know about the differences amongst surfaces B and C?
>>>>
>>>> Do I need to or can I make a contrast matrix to specify which are 
>>>> the comparisons I want to get information for? (only surfaces, and 
>>>> not amongst patients)
>>>>
>>>> If I can make a contrast matrix, can you give me an example of how 
>>>> to do it with 3 treatments?
>>>>
>>>> Many Thanks!
>>>>
>>>> Milena
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: 
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>     
>>>
>>>
>>>
>>>   
>>
>>
>>
>
>


-- 
Milena Gongora
Bioinformatician
SRC Computational Chemistry and Biology Unit
Institute for Molecular Biosciences
The University of Queensland

Phone: +61 7 3346 2609
Fax: +61 7 3346 2101
email: m.gongora at imb.uq.edu.au



More information about the Bioconductor mailing list