[BioC] Differential expression analysis in Limma for one factor after adjusting for a covariate

James W. MacDonald jmacdon at uw.edu
Wed Sep 11 16:01:52 CEST 2013


On 9/11/2013 7:55 AM, QAMRA Aditi (GIS) wrote:
> Thanks a lot - It is exactly what I was trying to understand !
>
> Could you help me understand one more thing ? Given the aim of finding genes that react differently to the treatment in males as compared to female, which approach would be better ?
>
> Approach1 - Find list of significantly differentially expressed genes between the 2 treatments and then run LIMMA again only on this subset of genes to compare difference between Males and females
>
> Approach2 - Use the interaction term to get the list of DEG that react differently to the treatment in males as compared to female
>
> Approach2 going by the results is more strict but I want to understand the pitfalls of approach 1

The difference is that Approach 1 as you describe it doesn't test what 
you want to find. The interaction specifically tests for a difference in 
the response to treatment in males vs females. Your approach 1 tests for 
a difference in response to treatment, and for those genes it then tests 
for a difference between the sexes.

Now an example. Let's say a gene is highly up-regulated in females when 
you treat, and highly down-regulated in males when you treat. When you 
test for a treatment-specific difference, you may not achieve 
significance because the gene reacts differently in the two sexes. So 
you won't bring that gene forward to the next step, even though that is 
exactly the gene you are looking for.

Best,

Jim


>
> Thank you !
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: Friday, August 30, 2013 9:35 PM
> To: QAMRA Aditi (GIS)
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Differential expression analysis in Limma for one factor after adjusting for a covariate
>
>
>
> On Friday, August 30, 2013 5:49:51 AM, QAMRA Aditi (GIS) wrote:
>> Hi,
>>
>> I have an expression dataset for both normal and diseased patients as
>> well as their gender information. What I want to know is to test for
>> difference in expression of males and females after having adjusted
>> for differences between a normal and diseased tissue type (group )
>> using Limma rather than anova function in R,
>>
>> I have 2 questions -
>>
>> 1. Does Limma allow inclusion of covariates ? How do I first adjust the expression dataset to remove differences because of the sample being a diseased sample and then understand the true difference between the exp of male and female in Limma. What I have been able to do uptil now is difference between males/females and normals/diseased. Would (Male.Diseased-Male.Normal)-(Female.Diseased-Female.Normal) (which is basically an interaction term) would give me this ?
> Any time you fit a model with various coefficients included, you are automatically adjusting for those coefficients. In other words, if you fit a model with sex and treatment and then compute the contrast between male and female, you are doing so after adjusting for treatment.
>
> But your question isn't that clear, so I don't know if that answers it.
> The interaction term gives you those genes that react differently to the treatment in males as compared to females. This is different from finding genes that are different in males vs females after adjusting for treatment, but again it isn't totally clear to me what you are asking.
>
>> 2. I was trying include both gender and group information as factors -
>> but when Im trying to build the model matrix -
>>
>> design <- model.matrix(~0+gender+group)
>>
>> where both gender and group are factors - i get the following layout
>> of the design matrix -
>>
>>      groupnormal groupdiseased genderM
>> 1             1          0       0
>> 2             1          0       1
>>
>> attr(,"assign")
>> [1] 1 1 2
>> attr(,"contrasts")
>> attr(,"contrasts")$group
>> [1] "contr.treatment"
>>
>> attr(,"contrasts")$gender
>> [1] "contr.treatment"
>>
>> Why do I not aslo see genderF as a column here ?
> Because that is the way R sets up the model matrix. The genderM coefficient is computing the difference between males and females, so if you want to test for sex differences you would simply test that this coefficient is different from zero.
>
> But this is something that Gordon has been pointing out for years; the conventional coefficients that you get from model.matrix() may not be the most useful in the context of a microarray experiment. You could instead do something like
>
> groupGend <- factor(paste(group, gender, sep = "_"))
>
> design <- model.matrix(~0+groupGend)
>
> and then your coefficients will be something directly interpretable, and easier to understand (e.g., you will have four coefficients, male_normal, male_diseased, female_normal, female_diseased, and then you can make more directed comparisons).
>
> Best,
>
> Jim
>
>
>> Thanks !
>>
>> -------------------------------
>> This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act.
>> -------------------------------
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> -------------------------------
> This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act.
> -------------------------------

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list