[BioC] limma question: direct two-color design & modeling individual subject effects

Tue May 1 01:02:44 CEST 2007

Dear Paul,

Your description of the limma model you've fitted is very clear, but you haven't explained exactly
what is in your picture.  The data values on the y-axis don't appear to be the M-values you used
to fit the linear model, because we don't see the up-down pattern we'd expect to see from
dye-swaps.  How have you obtained "fitted values"?  Note that M-values are already log-ratios, so
it doesn't make sense to write "log2M".

lmFit() simply does least squares regression.  It gives the same coefficients that you would get
from lm() for each gene.  I suggest that you extract the M-value data for one gene, and experiment
with fitting the data using lm(), until you're satisfied that you understand the parametrization
and fitted values.

Best wishes
Gordon

> [BioC] limma question: direct two-color design & modeling individual subject effects
> Paul Shannon pshannon at systemsbiology.org
> Mon Apr 30 05:08:15 CEST 2007
>
> I've been working on and off for a few months with limma on a set of 28 2-color
> arrays made up of 14 dye-swap pairs.  The main contrast in the arrays is between
> malaria parasite RNA extracted from maternal and from juvenile hosts;
> all the arrays can be described in these terms.  This is the main effect we
> are studying, and limma is very helpful in elucidating it.
>
> The arrays can be more specifically described as comparisons between specific
> maternal subjects and specific juvenile subjects -- between different
> combinations of three mothers (m918, m836, m920) with six children (c073, c135,
> c140, c372, c451, c413, c425).  I have trouble fitting models to some of these
> genes, failing to isolatethe effects of individual subjects where their effects seem
> to be strong.
>
> (A good example can be seen at http://gaggle.systemsbiology.net/pshannon/tmp/7346.png,
> where the effect of m920 is pronounced, but apparently missed by my lmFit/eBayes model.)
>
> Here are some few lines from each of the matrices I use that lead to that plot.
>
> ---- head (targets)
>
>   SlideNumber      Name            FileName      Cy3      Cy5 Mother Child
> 1        2254 slide2254 m918c073-cy3cy5.gpr maternal juvenile   m918  c073
> 2        2261 slide2261 m918c073-cy5cy3.gpr juvenile maternal   m918  c073
> 3        2258 slide2258 m836c073-cy3cy5.gpr maternal juvenile   m836  c073
> 4        2265 slide2265 m836c073-cy5cy3.gpr juvenile maternal   m836  c073
> 5        2341 slide2341 m836c135-cy3cy5.gpr maternal juvenile   m836  c135
> 6        2344 slide2344 m836c135-cy5cy3.gpr juvenile maternal   m836  c135
>
> ----- head (design)
>
>   mother child maternal
> 1   m918  c073      Low
> 2   m918  c073     High
> 3   m836  c073      Low
> 4   m836  c073     High
> 5   m836  c135      Low
> 6   m836  c135     High
>
> ---- create the model
>
> model <- model.matrix (~maternal + mother + child, design)
>
> head (model)
>   (Intercept) maternalHigh motherm918 motherm920 childc135 childc140 childc372 childc413
childc425 childc451
> 1           1            0          1          0         0         0         0         0        
0         0
> 2           1            1          1          0         0         0         0         0        
0         0
> 3           1            0          0          0         0         0         0         0        
0         0
> 4           1            1          0          0         0         0         0         0        
0         0
> 5           1            0          0          0         1         0         0         0        
0         0
> 6           1            1          0          0         1         0         0         0        
0         0
>
> ---- fit the data
>
> fit <- lmFit (MA, model)
> efit <- eBayes (fit)
>
> # one example of poor fit.  with probe 7346, the m920 effect is very strong, but the coefficients
> # don't reflect that.  instead, most of the influence is allocated to the maternal effect, which
> # nicely models all the comparisons except those involving m920.  the fit there is strikingly
> # poor, with high residuals. I can't make sense of the tiny motherm920 coefficient:
>
> > efit$coef [7346,]
>  (Intercept) maternalHigh   motherm918   motherm920    childc135    childc140    childc372   
childc413    childc425    childc451
>  -3.62867124   7.49268173   0.24858455  -0.02635289  -0.67898282  -0.24566235  -0.24673763  
0.10618603  -0.37520911  -0.02761610
>
> The plot of the fitted & actual values can be found at
>
>       http://gaggle.systemsbiology.net/pshannon/tmp/7346.png
>
> I may be over-interpreting, or mis-interpreting, or even misrepresenting all this.  But after lots
> of head scratching, lots of reading and experiments, I can't get the coefficients to do what I
think
> they should.  Perhaps it's my failure to use a contrast matrix.  Or something else.
>
> Any suggestions?  I'll be really grateful for any advice.
>
> Thanks!
>
>  - Paul