[BioC] Limma toptable question

Gordon Smyth smyth at wehi.edu.au
Sat Dec 3 02:51:29 CET 2005

```Dear Hua,

>Date: Wed, 30 Nov 2005 09:31:00 -0600
>From: "Hua Weng" <hweng at biochem.okstate.edu>
>Subject: Re: [BioC] Limma toptable question
>To: <bioconductor at stat.math.ethz.ch>
>
>Dear Bioconductor:
>
>There are two reasons that I think slr0146 shouldn't appear in top list:
>
>1) The average M value, 0.3261, for slr0146 is less than 0.5
> > MA.norm\$M[MA.norm\$genes["Name"] == "slr0146",]
> >      slide151120 slide151274-2
> > [1,]      0.3837       -0.2760
> > [2,]      0.3677       -0.2824
> > [3,]      0.3395       -0.2802

As Adai Ramasamy has already mentioned, a gene with a smaller fold change
may still appear high in the list if it also has a small standard deviation.

>2) There is only one full weight for slr0146, the other five spots have 0.1
>weight that means the other five spots have very low intensity in both
>channels
>
> > MA.norm\$weights[MA.norm\$genes["Name"] == "slr0146",]
> >      slide151120 slide151274-2
> > [1,]         0.1           0.1
> > [2,]         0.1           0.1
> > [3,]         1.0           0.1

The meaning of weights is often mis-understood. Because of the way that
weights are used in linear regression, the weights are used only as
relative weights for each gene. The weights only determine which spots are
given most weight for each gene. The weights are not used to give more
weight to one gene than another. For example, if you were to give all the
spots for one gene weight 0.0001, the result would be exactly the same as
if you had given all spots weight 1000.  In either case, all spots for that
gene would get the same weight. It is only the relative size, within any
particular gene, which counts.

Hence, the fact that most weights are 0.1 for this gene does not make it
less likely to appear high in the list. In fact, in the spot with weight 1
is very accurate, the variable weights may move this gene higher than it
otherwise would be.

>For slr1501 and slr1113, both of them have very high average M value, and
>all the spots for these two probes are full weights that means all these
>spots have strong intensity.
>
> > MA.norm\$M[MA.norm\$genes["Name"] == "slr1501",]
> >      slide151120 slide151274-2
> > [1,]       4.977        -4.874
> > [2,]       4.950        -4.155
> > [3,]       4.896        -4.649
>
> > MA.norm\$M[MA.norm\$genes["Name"] == "slr1113",]
> >      slide151120 slide151274-2
> > [1,]       2.350        -1.560
> > [2,]       2.258        -1.605
> > [3,]       2.398        -1.725

Again, you have not considered standard deviations.

If you have lots of genes with small fold changes appearing high in your
top-table, it is probably because not a lot of smoothing of the standard
deviations is occuring. (Look at fit\$df.prior to see if this is so.) You
can improve this by giving more attention in your prepressing of the data
to stabilising the variances, e.g., by better background correction.

>Thank you very much for your quick response.
>Hua Weng

Best wishes
Gordon

>-----Original Message-----
>From: Gordon K Smyth [mailto:smyth at wehi.EDU.AU]
>Sent: Tuesday, November 29, 2005 3:12 PM
>To: Hua Weng
>Subject: Re:Limma Top Table question
>
>Dear Hua,
>
>You don't appear to have sent your question to the bioconductor mailing
>list.  I suggest you check the email address.
>
>See my questions below.
>
>On Wed, November 30, 2005 7:47 am, Hua Weng wrote:
> > Hello Dr. Smyth and Bioconductor:
> >
> > I have a question about toptable in limma package in Bioconductor. I
> > have confusing about the result after I fit linear model to my data. I
> > have two dye-swap technical replicate slides and three technical
> > replicates within each slide. I wrote a filter function and gave low
> > weights to the spots with low intensity in both channels. The source code
>is as following:

[lots of code deleted]

> > I am confusing about the output from toptable after I fit ebays
> > function. I think slr0146 with average M value 0.3261 shouldn't appear
> > in top list but it does show up as the second significantly differential
>expressed gene.
>
>Why do you think that slr0146 should not appear?
>
> > And
> > gene slr1501 should appear as the first and slr1113 should appear in
> > top five but it doesn't.
>
>Again, what do you think this?
>
>Gordon
>
> > Could you be kind enough to tell me what's wrong in my code?
> >
> >
> > Best Regards,
> > Hua Weng
> >
> > Scientific Programmer
> > Microarray Core Facility
> > Oklahoma State University
> > Department of Biochemistry and Molecular Biology
> > 246 Noble Research Center
> > Stillwater, OK  74078
> > hweng at biochem.okstate.edu
> > (405) 744-6209
> > FAX (405) 744-7799
> > http://microarray.okstate.edu/

```