[BioC] Understanding limma, fdr and topTable

Tue Jul 8 14:27:25 CEST 2008

Jim, Tom and Naomi and all-

Thank you for your reponses. I am planning to look for genes (if any) that
changed similaryly between the two groups depending on amplification
procedure but was just getting starting looking at individually first. I
appreciate the advice. Would pre-filtering to remove "absent" from all
sample genes improve the situation or create a different problem, reading
the archives there seems to be some disagreement if prefiltering before
eBayes is appropriate. Thank you again I appreciate any help I get.

Louisa

 "If we knew what we were doing, it wouldn't be called Research." - Albert
Einstein

Louisa Rispoli, Ph.D. Reproductive Physiology
Department of Animal Science
University of Tennessee, Knoxville
A105 Johnson Animal Research and Teaching Unit
1750 Alcoa Highway
Knoxville, TN 37920
phone:(865) 946-1874
fax:(865) 946-1010
email: lrispoli at utk.edu

             James MacDonald                                               
             <jmacdon at med.umic                                             
             h.edu>                                                     To 
             Sent by:                  Louisa A Rispoli/AS/EXP/UTIA        
             bioconductor-boun         <larispoli at mail.ag.utk.edu>         
             ces at stat.math.eth                                          cc 
             z.ch                      bioconductor at stat.math.ethz.ch      
                                                                   Subject 
                                       Re: [BioC] Understanding limma, fdr 
             07/07/2008 06:44          and topTable                        
             PM                                                            

Hi Louisa,

Louisa A Rispoli/AS/EXP/UTIA wrote:
> To all-
>
> I am a newbie trying to analyze microarray with minimal help and thought
> that I had figured out all. We have a simple task of comparing two groups
> with 8 replicates on the bovine genechip. I am attempting to understand
the
> results that I obtain using limma and adjusting for fdr. I have tried
> reading the help vignettes on p.adjust and topTablle but no closer to
> understanding if the adjusted p-value represents the fdr or the q-value
or
> something altogether different. Based on some recent work in another lab
> (by abstract) I know that I may need to use a less stringent fdr then 5%
> but I am unsure in limma how to change that value (or if it is feasible
at
> all). Looking at the results that I obtained so far, I do not have any
> differentially expressed genes with the fdr adjustment. But I could be
> interpreting that wrong, since I was looking for values of the adjusted p
> to be lower then 0.05 and the smallest that I see is 0.7816, If someone
> could look at this and assist, any help, advice or reprimmand would be
> appreciated.

The adjusted p-value does indeed represent the FDR, and it looks like
you have little evidence for differential expression. However, this
doesn't mean that there is no differential expression, just that you
don't have much evidence to say there is.

Given the data in hand, the table you show does give those probesets
that appear to be the most consistently changed, and those are the most
likely to validate if you were to choose to do so.

Looking at the way you fit the model, I wonder what the end goal of the
experiment might have been. When I see that sort of thing in the lab,
almost always the client is looking for the interaction (e.g., genes
that react differently to the treatment with HS depending on if they are
wt or polyA). Is that not the case here?

Regardless, I assume at the very least you might want to know the
difference between the hs and control wt samples as well. If you fit a
model that includes these samples and then compute the contrasts you
might get better results (depending on the intra-group variability of
the wt samples), as the denominator of your contrast will be based on
all 32 samples rather than just the 16 you are using currently.

Best,

Jim

>
>
> Thanks
>
> Louisa
>
> "If we knew what we were doing, it wouldn't be called Research." - Albert
> Einstein
>
> Louisa Rispoli, Ph.D. Reproductive Physiology
> Department of Animal Science
> University of Tennessee, Knoxville
> A105 Johnson Animal Research and Teaching Unit
> 1750 Alcoa Highway
> Knoxville, TN 37920
> phone:(865) 946-1874
> fax:(865) 946-1010
> email: lrispoli at utk.edu
>
> R version 2.7.1 (2008-06-23)
> Copyright (C) 2008 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
>> library(affycoretools)
> Loading required package: affy
> Loading required package: Biobase
> Loading required package: tools
>
> Welcome to Bioconductor
>
>   Vignettes contain introductory material. To view, type
>   'openVignette()'. To cite Bioconductor, see
>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> Loading required package: affyio
> Loading required package: preprocessCore
> Loading required package: limma
> Loading required package: GOstats
> Loading required package: graph
> Loading required package: GO.db
> Loading required package: AnnotationDbi
> Loading required package: DBI
> Loading required package: RSQLite
> Loading required package: annotate
> Loading required package: xtable
> Loading required package: RBGL
> Loading required package: Category
> Loading required package: genefilter
> Loading required package: survival
> Loading required package: splines
> Loading required package: biomaRt
> Loading required package: RCurl
>
> Attaching package: 'biomaRt'
>
>
>         The following object(s) are masked from package:annotate :
>
>          getGO
>
> Loading required package: gcrma
> Loading required package: matchprobes
> Loading required package: annaffy
> Loading required package: KEGG.db
>
> Attaching package: 'annaffy'
>
>
>         The following object(s) are masked from package:RCurl :
>
>          getURL
>
>> pd <- read.AnnotatedDataFrame("pData.txt", sep="\t", header=TRUE,
> row.names=1)
>> data <- ReadAffy(phenoData=pd)
>> pData(data)
>                amp  trt
> PolyC-1.CEL  PolyA Ctrl
> PolyC-2.CEL  PolyA Ctrl
> PolyC-3.CEL  PolyA Ctrl
> PolyC-4.CEL  PolyA Ctrl
> PolyC-5.CEL  PolyA Ctrl
> PolyC-6.CEL  PolyA Ctrl
> PolyC-7.CEL  PolyA Ctrl
> PolyC-8.CEL  PolyA Ctrl
> PolyHS-1.CEL PolyA   HS
> PolyHS-2.CEL PolyA   HS
> PolyHS-3.CEL PolyA   HS
> PolyHS-4.CEL PolyA   HS
> PolyHS-5.CEL PolyA   HS
> PolyHS-6.CEL PolyA   HS
> PolyHS-7.CEL PolyA   HS
> PolyHS-8.CEL PolyA   HS
> WTC-1.CEL       WT Ctrl
> WTC-2.CEL       WT Ctrl
> WTC-3.CEL       WT Ctrl
> WTC-4.CEL       WT Ctrl
> WTC-5.CEL       WT Ctrl
> WTC-6.CEL       WT Ctrl
> WTC-7.CEL       WT Ctrl
> WTC-8.CEL       WT Ctrl
> WTHS-1.CEL      WT   HS
> WTHS-2.CEL      WT   HS
> WTHS-3.CEL      WT   HS
> WTHS-4.CEL      WT   HS
> WTHS-5.CEL      WT   HS
> WTHS-6.CEL      WT   HS
> WTHS-7.CEL      WT   HS
> WTHS-8.CEL      WT   HS
>> Poly.rma <- rma(data[,1:16])
> Background correcting
> Normalizing
> Calculating Expression
>> pData(Poly.rma)
>                amp  trt
> PolyC-1.CEL  PolyA Ctrl
> PolyC-2.CEL  PolyA Ctrl
> PolyC-3.CEL  PolyA Ctrl
> PolyC-4.CEL  PolyA Ctrl
> PolyC-5.CEL  PolyA Ctrl
> PolyC-6.CEL  PolyA Ctrl
> PolyC-7.CEL  PolyA Ctrl
> PolyC-8.CEL  PolyA Ctrl
> PolyHS-1.CEL PolyA   HS
> PolyHS-2.CEL PolyA   HS
> PolyHS-3.CEL PolyA   HS
> PolyHS-4.CEL PolyA   HS
> PolyHS-5.CEL PolyA   HS
> PolyHS-6.CEL PolyA   HS
> PolyHS-7.CEL PolyA   HS
> PolyHS-8.CEL PolyA   HS
>> treatment <-c("C",
> "C","C","C","C","C","C","C","HS","HS","HS","HS","HS","HS","HS","HS")
>> design <- model.matrix(~factor(treatment))
>> colnames(design) <- c("Ctrl", "CvsHS")
>> design
>    Ctrl CvsHS
> 1     1     0
> 2     1     0
> 3     1     0
> 4     1     0
> 5     1     0
> 6     1     0
> 7     1     0
> 8     1     0
> 9     1     1
> 10    1     1
> 11    1     1
> 12    1     1
> 13    1     1
> 14    1     1
> 15    1     1
> 16    1     1
> attr(,"assign")
> [1] 0 1
> attr(,"contrasts")
> attr(,"contrasts")$`factor(treatment)`
> [1] "contr.treatment"
>> options(digits=4)
>> topTable(fit2, coef=2, n=25, sort.by="B", adjust="fdr")
>                       ID   logFC AveExpr      t   P.Value adj.P.Val
B
> 16088   Bt.27852.2.S1_at -0.4764   3.621 -5.274 5.709e-05    0.7816
-1.722
> 12937   Bt.24859.1.A1_at  0.6632   6.940  5.152 7.374e-05    0.7816
-1.790
> 2853    Bt.13563.2.S1_at -0.5288   6.747 -4.703 1.922e-04    0.7816
-2.061
> 2474    Bt.13162.1.S1_at -0.6424   5.964 -4.574 2.534e-04    0.7816
-2.142
> 10402   Bt.22107.1.S1_at -0.3261  10.358 -4.475 3.143e-04    0.7816
-2.207
> 10654   Bt.22355.1.S1_at -0.3533   8.650 -4.316 4.439e-04    0.7816
-2.312
> 8883    Bt.20584.1.S1_at -0.3671   8.106 -4.225 5.417e-04    0.7816
-2.374
> 2851    Bt.13562.1.S1_at  0.6901   8.865  4.190 5.850e-04    0.7816
-2.399
> 5772     Bt.1754.1.S1_at -0.5986   9.348 -4.187 5.885e-04    0.7816
-2.400
> 19715    Bt.4440.1.A1_at -0.6881   2.526 -4.160 6.242e-04    0.7816
-2.419
> 3163    Bt.13933.1.S1_at  0.4519   7.798  4.138 6.549e-04    0.7816
-2.434
> 14921   Bt.26765.1.S1_at -0.3673   7.846 -4.110 6.962e-04    0.7816
-2.454
> 10751   Bt.22445.1.S1_at -0.3222   8.622 -4.108 7.002e-04    0.7816
-2.456
> 2906    Bt.13629.1.A1_at  0.2830   2.294  4.095 7.196e-04    0.7816
-2.464
> 5721     Bt.1749.1.S1_at  0.6030   3.839  4.072 7.578e-04    0.7816
-2.481
> 21192    Bt.6078.1.S1_at -0.4385   8.564 -4.054 7.879e-04    0.7816
-2.493
> 800     Bt.11145.1.S1_at -0.5060   2.461 -4.030 8.312e-04    0.7816
-2.511
> 27    AFFX-Bt-ECOLOXL_at  0.3410   1.290  4.022 8.443e-04    0.7816
-2.516
> 22515    Bt.7980.1.S1_at -0.3734   7.622 -3.983 9.196e-04    0.7816
-2.543
> 4609    Bt.16378.1.A1_at -0.3775   5.071 -3.964 9.607e-04    0.7816
-2.557
> 13765   Bt.25669.1.S1_at -0.5523   8.197 -3.948 9.933e-04    0.7816
-2.568
> 12023   Bt.24033.1.A1_at -0.4962   6.256 -3.917 1.065e-03    0.7816
-2.591
> 20843    Bt.5578.1.S1_at -0.4133   7.904 -3.913 1.073e-03    0.7816
-2.594
> 17021   Bt.28620.1.S1_at  0.4720   5.181  3.899 1.106e-03    0.7816
-2.603
> 23922    Bt.9791.1.S1_at -0.3420   9.963 -3.893 1.122e-03    0.7816
-2.608
>
>> sessionInfo()
> R version 2.7.1 (2008-06-23)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
>
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
>  [1] bovinecdf_2.2.0      affycoretools_1.12.0 annaffy_1.12.1
> KEGG.db_2.2.0        gcrma_2.12.1         matchprobes_1.12.0
>  [7] biomaRt_1.14.0       RCurl_0.9-3          GOstats_2.6.0
> Category_2.6.0       genefilter_1.20.0    survival_2.34-1
> [13] RBGL_1.16.0          annotate_1.18.0      xtable_1.5-2
> GO.db_2.2.0          AnnotationDbi_1.2.2  RSQLite_0.6-9
> [19] DBI_0.2-4            graph_1.18.1         limma_2.14.5
> affy_1.18.2          preprocessCore_1.2.0 affyio_1.8.0
> [25] Biobase_2.0.1
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.11 XML_1.95-3
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, MS
Biostatistician
UMCCC cDNA and Affymetrix Core
University of Michigan
1500 E Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor