[BioC] minimal number of features tested in edgeR

Thu Oct 25 12:32:51 CEST 2012

Hi Stephanie,

In theory, the minimal number of features you can test is 1.  From your three rows (2 groups of 2 replicates), you have 6 degrees of freedom to estimate a common dispersion, as opposed to 2 with just one feature.  This should "help" and I would consider that an improvement.

Assuming some other things fall into place (e.g. it's reasonable to assume, at least to a first-order approximation, that features have the same dispersion),  then this should be ok.  Assuming they are representative, you could also consider other using other features (that you've presumably filtered?) for just the purpose of estimating dispersion and only test the 3 features of interest.  This only helps if they are representative, but gets a bit hard to defend.

Anyways, these are just opinions and possibilities.

Best, Mark

On 25.10.2012, at 11:23, Stephanie [guest] wrote:

> 
> Hi,
> 
> I have a question regarding the minimal number of genes that we can test in an analysis with edgeR. Let me explain, in a study,  edgeR have been used for testing the differential expression of three viruses between two conditions, without considering the counts on other features. That is, the data frame d$counts has only three lines (and 4 columns, as there is two replicates per condition). The library sizes, however, correspond to the total number of tags aligned both on these viruses and on the genes of the host organism. It seems inappropriate to me, as I don't understand how it would be possible to estimate reliably the dispersion from only three features, but maybe I'm wrong... May I have your opinion?
> For you, what is the minimal number of features that we can test using edgeR?
> 
> Thank you by advance for your help.
> 
> Best regards,
> 
> StÃ©phanie
> 
> -- output of sessionInfo(): 
> 
> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
> [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
> [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
> [7] LC_PAPER=C                 LC_NAME=C                 
> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] edgeR_2.6.2  limma_3.12.0
> 
> loaded via a namespace (and not attached):
> [1] annotate_1.34.0      AnnotationDbi_1.18.0 Biobase_2.16.0      
> [4] BiocGenerics_0.2.0   DBI_0.2-5            DESeq_1.8.2         
> [7] genefilter_1.38.0    geneplotter_1.34.0   grid_2.15.0         
> [10] IRanges_1.14.3       RColorBrewer_1.0-5   RSQLite_0.11.1      
> [13] splines_2.15.0       stats4_2.15.0        survival_2.36-14    
> [16] xtable_1.7-0 
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin

----------
http://www.fgcz.ch/Bioconductor2012