[BioC] Unexpected results using limma with numerical factor

paul.boutros at utoronto.ca paul.boutros at utoronto.ca
Wed Aug 25 04:30:02 CEST 2004


Hi again,

For an experiment I'm analyzing, I do not have a series of factors.  Rather, I 
have a pair of test-score (numerical) for each replicate animal.  This is 
Affymetrix data, and for my initial pass I tried with only a single test-score.  
The commands I used are below:

> library(gcrma);
Welcome to Bioconductor 
         Vignettes contain introductory material.  To view, 
         simply type: openVignette() 
         For details on reading vignettes, see
         the openVignette help page.
> library(limma);
> cel.files <- c(
+ 'RAE230_2_060104_LH_IM07T.CEL',
+ 'RAE230_2_060104_LH_IM08T.CEL',
+ 'RAE230_2_060104_LH_IM09T.CEL',
+ 'RAE230_2_060104_LH_IM10T.CEL',
+ 'RAE230_2_060204_LH_IM07T.CEL',
+ 'RAE230_2_060204_LH_IM08T.CEL',
+ 'RAE230_2_060204_LH_IM09T.CEL',
+ 'RAE230_2_060204_LH_IM10T.CEL'
+ );
> eset <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> eset <- ReadAffy(filenames=cel.files, phenoData="phenodata.txt");
> eset;
AffyBatch object
size of arrays=834x834 features (43476 kb)
cdf=Rat230_2 (31099 affyids)
number of samples=8
number of genes=31099
annotation=rat2302
> pData(eset);
                               TestScore1
RAE230_2_060104_LH_IM07T.CEL         0.58
RAE230_2_060104_LH_IM08T.CEL        -2.36
RAE230_2_060104_LH_IM09T.CEL       -12.24
RAE230_2_060104_LH_IM10T.CEL       -14.84
RAE230_2_060204_LH_IM07T.CEL         0.15
RAE230_2_060204_LH_IM08T.CEL        -3.23
RAE230_2_060204_LH_IM09T.CEL       -11.66
RAE230_2_060204_LH_IM10T.CEL       -12.91
> eset <- rma(eset);
Background correcting
Normalizing
Calculating Expression
> design <- model.matrix(~-1 + TestScore1, pData(eset));
> design;
                               TestScore1
RAE230_2_060104_LH_IM07T.CEL         0.58
RAE230_2_060104_LH_IM08T.CEL        -2.36
RAE230_2_060104_LH_IM09T.CEL       -12.24
RAE230_2_060104_LH_IM10T.CEL       -14.84
RAE230_2_060204_LH_IM07T.CEL         0.15
RAE230_2_060204_LH_IM08T.CEL        -3.23
RAE230_2_060204_LH_IM09T.CEL       -11.66
RAE230_2_060204_LH_IM10T.CEL       -12.91
attr(,"assign")
[1] 1
> fit1 <- lmFit(eset, design);
> fit3 <- eBayes(fit1);


All proceeds well without any error-messages, so I believed I had successfully 
fit my model.  When I extract the data, however, I get some unexpected results:

> topTable(fit3, coef=1, number=20, adjust="fdr");
                       ID         M        A         t      P.Value        B
104            1367555_at -1.212175 14.77173 -6.808742 3.912627e-07 14.75482
105          1367556_s_at -1.203275 14.67175 -6.762709 3.912627e-07 14.48816
3411           1370862_at -1.185363 14.42457 -6.674599 3.912627e-07 13.98156
2777           1370228_at -1.184768 14.46704 -6.666414 3.912627e-07 13.93475
549            1368000_at -1.175866 14.33987 -6.622929 3.912627e-07 13.68683
2697           1370148_at -1.174858 14.32747 -6.617795 3.912627e-07 13.65764
837            1368288_at -1.170412 14.25957 -6.596453 3.912627e-07 13.53649
710          1368161_a_at -1.169621 14.27702 -6.589664 3.912627e-07 13.49801
420            1367871_at -1.166552 14.09481 -6.588461 3.912627e-07 13.49120
2558           1370009_at -1.162241 14.18252 -6.552347 3.912627e-07 13.28707
147            1367598_at -1.161072 14.19550 -6.543620 3.912627e-07 13.23787
2576         1370027_a_at -1.158955 14.13682 -6.536068 3.912627e-07 13.19533
1136           1368587_at -1.154447 14.04036 -6.517046 3.912627e-07 13.08836
196            1367647_at -1.154999 14.07951 -6.516673 3.912627e-07 13.08627
31081 AFFX-r2-P1-cre-5_at -1.153568 14.05140 -6.510380 3.912627e-07 13.05094
19150          1387082_at -1.153501 14.05431 -6.509674 3.912627e-07 13.04697
2898         1370349_a_at -1.153164 14.07047 -6.505935 3.912627e-07 13.02599
1235           1368686_at -1.152660 14.04966 -6.504796 3.912627e-07 13.01960
8200           1375651_at -1.152557 14.04428 -6.504674 3.912627e-07 13.01892
19359          1387291_at -1.151666 14.03699 -6.499744 3.912627e-07 12.99127

The near-identical M, A, and p-values indicate a problem, and none of the genes 
on this list are very plausible biologically for our test-system.  Based on that 
I'm pretty sure I've gone astray somewhere.

Is it possible to use numeric scores in fitting a linear model with limma?  If 
so, am I asking the question in the right manner?  If not, are there any 
BioConductor tools appropriate for this kind of question?

Any help very much appreciated,
Paul



More information about the Bioconductor mailing list