[BioC] Differential expression analysis in Limma for one factor after adjusting for a covariate

Wed Sep 11 13:55:06 CEST 2013

Thanks a lot - It is exactly what I was trying to understand !

Could you help me understand one more thing ? Given the aim of finding genes that react differently to the treatment in males as compared to female, which approach would be better ?

Approach1 - Find list of significantly differentially expressed genes between the 2 treatments and then run LIMMA again only on this subset of genes to compare difference between Males and females

Approach2 - Use the interaction term to get the list of DEG that react differently to the treatment in males as compared to female

Approach2 going by the results is more strict but I want to understand the pitfalls of approach 1

Thank you !

-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu]
Sent: Friday, August 30, 2013 9:35 PM
To: QAMRA Aditi (GIS)
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Differential expression analysis in Limma for one factor after adjusting for a covariate

On Friday, August 30, 2013 5:49:51 AM, QAMRA Aditi (GIS) wrote:
> Hi,
>
> I have an expression dataset for both normal and diseased patients as
> well as their gender information. What I want to know is to test for
> difference in expression of males and females after having adjusted
> for differences between a normal and diseased tissue type (group )
> using Limma rather than anova function in R,
>
> I have 2 questions -
>
> 1. Does Limma allow inclusion of covariates ? How do I first adjust the expression dataset to remove differences because of the sample being a diseased sample and then understand the true difference between the exp of male and female in Limma. What I have been able to do uptil now is difference between males/females and normals/diseased. Would (Male.Diseased-Male.Normal)-(Female.Diseased-Female.Normal) (which is basically an interaction term) would give me this ?

Any time you fit a model with various coefficients included, you are automatically adjusting for those coefficients. In other words, if you fit a model with sex and treatment and then compute the contrast between male and female, you are doing so after adjusting for treatment.

But your question isn't that clear, so I don't know if that answers it.
The interaction term gives you those genes that react differently to the treatment in males as compared to females. This is different from finding genes that are different in males vs females after adjusting for treatment, but again it isn't totally clear to me what you are asking.

>
> 2. I was trying include both gender and group information as factors -
> but when Im trying to build the model matrix -
>
> design <- model.matrix(~0+gender+group)
>
> where both gender and group are factors - i get the following layout
> of the design matrix -
>
>     groupnormal groupdiseased genderM
> 1             1          0       0
> 2             1          0       1
>
> attr(,"assign")
> [1] 1 1 2
> attr(,"contrasts")
> attr(,"contrasts")$group
> [1] "contr.treatment"
>
> attr(,"contrasts")$gender
> [1] "contr.treatment"
>
> Why do I not aslo see genderF as a column here ?

Because that is the way R sets up the model matrix. The genderM coefficient is computing the difference between males and females, so if you want to test for sex differences you would simply test that this coefficient is different from zero.

But this is something that Gordon has been pointing out for years; the conventional coefficients that you get from model.matrix() may not be the most useful in the context of a microarray experiment. You could instead do something like

groupGend <- factor(paste(group, gender, sep = "_"))

design <- model.matrix(~0+groupGend)

and then your coefficients will be something directly interpretable, and easier to understand (e.g., you will have four coefficients, male_normal, male_diseased, female_normal, female_diseased, and then you can make more directed comparisons).

Best,

Jim

>
> Thanks !
>
> -------------------------------
> This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act.
> -------------------------------
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

-------------------------------
This e-mail and any attachments are only for the use of the intended recipient and may be confidential and/or privileged. If you are not the recipient, please delete it or notify the sender immediately. Please do not copy or use it for any purpose or disclose the contents to any other person as it may be an offence under the Official Secrets Act.
-------------------------------