[BioC] Fitting linear model with different design matrix for each gene in limma

Thu Dec 19 02:30:34 CET 2013

Thanks a lot for the quick answer. 

However, lm cannot incorporate the fact that there are multiple measurements of the same subject as well as measurements from different subjects for a given factor level (REF or PLUS). I am leaning towards using a linear mixed effects model incorporating the factor of interest as a fixed effect and subject as a random effect. I think I can use lme from nlme package for this. Would you have any recommendations about which of the two models to use: 

fit <- lm(data$expr ~ data$Factor + data$Subject) 
vs
fit2 <- lme(expr ~ Factor, data, random = ~ 1 | subject)

Thanks again!

----- Original Message -----
From: "Ryan" <rct at thompsonclan.org>
To: "Elif Sarinay [guest]" <guest at bioconductor.org>
Cc: bioconductor at r-project.org, ccenik at stanford.edu
Sent: Wednesday, December 18, 2013 4:06:09 PM
Subject: Re: [BioC] Fitting linear model with different design matrix for each gene in limma

I'm pretty sure the post that you link to gives you your answer: if you 
want to fit a different model for each gene, then you should simply 
call "lm" on each gene individually instead of lmFit. You won't be able 
to get the empirical Bayes squeezing that limma performs, of course. 
However, I can't think of an experimental design where doing this makes 
logical sense. Are you sure you don't need to make a single design 
matrix with one coefficient for each "REF vs PLUS" partitioning, and 
then fit this single design for all genes using lmFit?

On Wed Dec 18 11:55:57 2013, Elif Sarinay [guest] wrote:
>
> Hi,
>
> I have expression data from a large number of subjects with replicates. I am interested in fitting a linear model to understand the effect of a simple factor with two levels on the expression of each gene. I realize that if my design is as follows:
> Subject Factor
> 1       REF
> 1       REF
> 2       REF
> 2       REF
> 3       REF
> 3       REF
> 4       PLUS
> 4       PLUS
> 5       PLUS
> 5       PLUS
> ...
>
> I can fit a linear model with lmFit using
> design <- model.matrix(~targets$Factor)
> corfit <- duplicateCorrelation(data,design,block=targets$Subject)
> lmFit (data, design, weights=weights, block=targets$Subject, correlation=corfit$consensus)
>
> However, in my case the design is different for each gene. In other words, the subjects that belong to the "REF" or "PLUS" level of the factor changes for each gene. I am wondering if there is any possibility to include the changing design for each gene.
>
> I tried applying lmFit to each gene separately using the correct design with an appropriate apply function. However, I found Dr. Smyth's answer to an earlier question (https://stat.ethz.ch/pipermail/bioconductor/2010-August/034794.html) suggesting that this is not a good alternative.
>
>
> Thanks for any advice
>
>   -- output of sessionInfo():
>
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] nlme_3.1-103 edgeR_2.4.6  limma_3.10.3
>
> loaded via a namespace (and not attached):
> [1] grid_2.14.1    lattice_0.20-6 sva_3.0.3      tools_2.14.1
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor