[BioC] limma: print-tip loess and empty spots

Sat Jun 2 17:41:52 CEST 2007

Quoting Gordon Smyth <smyth at wehi.edu.au>:

> Dear Adrian,
>
> At 08:36 AM 2/06/2007, Adrian Steward wrote:
>> Thank you for your reply, Dr. Smyth.
>>
>> I do not yet completely understand exactly HOW normalizing works
>> (I've seen the data, transformations, and so I know what it does,
>> just not how, yet) but it appears to me that I can simply change the
>> sign of the normalized output to make the proper tests
>
> In general, you cannot simplify the constructions of tests by
> swapping the sign of the normalized log-ratios. The only experiment
> in which people might be tempted to do this is a simple replicated
> comparison using two-colour arrays with dye-swaps (and you have given
> no indication that this is your experiment.) For anything more
> complicated, swapping the signs of the log-ratios would only
> complicate matters. Even for the replicated comparison, swapping the
> log-ratios is unhelpful because it prevents the inclusion of
> probe-specific dye-effects in the model.
>
>>  (or as someone else stated, reverse the contrast / estimate statements).
>> You picked up on my motivations here - I am chiefly concerned that
>> the exported normalized data has proper signs
>
> The normalized data already has what we consider to be the "proper" signs.
>
>>  because at present I am required to do all of my linear modeling
>> in SAS, and large datasets need to be 'read in.'   I personally
>> would rather do it all in R which is why I am running things in
>> parallel to make the case for limma-only analysis.
>
> You can certainly fit linear models in SAS, but you can't do a limma
> empirical Bayes analysis.
>
>> You people are both programmers and teachers, and thanks for your
>> patience with the noobs.
>>
>> AM
>
> You can easily change the signs of columns of data in either R or
> SAS. You could get advice on how to do this from the R help list. But
> don't expect this from me or Keith because I believe it is undesirable.
>
> There is absolutely no reason why linear modelling in SAS or R
> requires any prior fudging of the data. You can easily handle the
> data as it actually is. Spend a little more time understanding how
> linear modelling works for microarray data, then you'll see why this
> is so. That would be time much better spent than trying to persuade
> limmaGUI to do what it doesn't want to do.
>
> Best wishes
> Gordon
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:   
> http://news.gmane.org/gmane.science.biology.informatics.conductor

Very reluctantly I will jump in, because I remember my own experience  
as a total "newbie" to this world not long ago... and I feel that the  
reason why Adrian is asking about changing the signs may have little  
to do with linear modelling and what it does to the data. At least, I  
had similar questions... but I don't want to insult Adrian by  
comparing him with me ;-)

When I started with limma (actually limmaGUI), I did it with data from  
dye-swap experiments. After normalisation, the sign of the M values is  
determined by the log2 of the ratio Cy5/Cy3, as Gordon explained.  
That's the convention. Just like we generally agree to call Cy3 the  
Green channel, and Cy5 the Red channel... which was counterintuitive  
for somebody like me, who was used to using Cy3 in microscopy and it's  
usually seen as red (reddish, but the computer then goes and paints it  
bright red)... Just a convention.
According to that convention, teh signs of my dye-swapped arrays were  
either positive or negative, depending on teh orientation of the hyb  
in question. At first that was a little disorientationg, because I had  
to make sure I remember which array was hyb in what order (info that's  
stored in the 'targets' object, if using limma).
However, one doesn't need to worry about that. The normalised data  
(per array) I only look at it to check the quality of the hybs,  
really... to make a few MA plots and see general patterns, check for  
artifacts, etc.
After that step, we take the normalised data, and we fit a linear  
model to it with the function 'lmFit'. Limma does this taking into  
account the orientation of the separate hybs (information present in  
the 'targets' object), and using a design matrix of our choice.  
Similarly if we want to specify particular contrasts. After this, we  
obtain M values that have the "correct" sign, according to whatever  
orientation we indicate in teh design matrix... so not only it's not  
necessary to change manually the signs of the normalised data, per  
array, but also, if we do so, we'd mess up the linear model fitting...  
which is the whole point about using Limma.

So, if I have four slides, comparing samples A and B, with two dye swaps:

Array   Cy3   Cy5
1       A     B
2       A     B
3       B     A
4       B     A

and I am ultimately interested in B-A, and I have a gene X that has  
higher expression in sample B than in A... when I normalise the data,  
the M values for that gene X will be positive in arrays 1 and 2, and  
negative in arrays 3 and 4 [log2(Cy5/Cy3)].
After fitting teh linear modelling, where we indicate we want the  
comparison B-A, what we'll get is a single M value, and its sign will  
be _positive_.

I am not sure if this helped any, or it was too obvious to be of any  
use... I just felt you were using limma only half-way, stopping at the  
normalisation stage, and ignoring the 'best' part of it: the linear  
modelling.

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK