[BioC] edgeR norm.factors NaN

Fri Jul 20 09:00:32 CEST 2012

HI Colin,

I believe its too many zeros.  Basically, in the docs it says:

-----
If ‘refColumn’ is unspecified, the library whose upper quartile is
     closest to the mean upper quartile is used.
-----

I think this breaks down with your data.

But the major issue you'll need to deal with is that for the first 4 columns of counts, you barely have any!  In 'counts4', you have 4 total reads mapped.  I've seen early experiments with 10s of thousands of total mapped reads, but <20 is surely a mistake.  Are you sure this experiment worked, or that your custom annotation has captured the mappings correctly?

Best,
Mark

On 19.07.2012, at 11:02, <Davenport.Colin at mh-hannover.de> <Davenport.Colin at mh-hannover.de> wrote:

> Dear Bioconductors,
> 
> I have an issue with calculating normalisation factors in edgeR. This has always i.e. on three other datasets worked just fine, which leads me baffled here.
> 
> To summarise-
> -NaNs occur independently of the calcNormFactors method
> -the counts appear ok
> -no NaNs are present in the counts
> 
> 
> virusDGE = calcNormFactors(virusDGE, method="TMM")
> virusDGE = calcNormFactors(virusDGE, method="RLE")
> virusDGE = calcNormFactors(virusDGE, method="upperquartile")
> 
> 
>> virusDGE
> An object of class "DGEList"
> $samples
>                      group lib.size norm.factors
> counts1   	all       17          NaN
> counts2   	all        8          NaN
> counts3    	all       14          NaN
> counts4    	all        4          NaN
> counts5   	all    18218          NaN
> counts6   	all    37146          NaN
> counts7    	all     2579          NaN
> counts8    	all     1027          NaN
> 
> $counts
>            		counts1	 counts2 counts3
> MuHV1_gp001                     0                     0                    0
> MuHV1_gp002                     0                     0                    0
> MuHV1_gp003                     0                     0                    0
> MuHV1_gp004                     0                     0                    0
> MuHV1_gp005                     0                     0                    0
>            		counts4	 		counts5 		counts6
> MuHV1_gp001                    0                     0                     1
> MuHV1_gp002                    0                     4                     5
> MuHV1_gp003                    0                    13                    18
> MuHV1_gp004                    0                    11                     2
> MuHV1_gp005                    0                     4                     6
>            		counts7 		counts8
> MuHV1_gp001                    0                    0
> MuHV1_gp002                    0                    0
> MuHV1_gp003                    3                    0
> MuHV1_gp004                    3                    0
> MuHV1_gp005                    2                    0
> 
> 
> 
> is.integer(virusDGE$counts)
> #TRUE
> is.na(virusDGE$counts)
> #(all are FALSE)
>> sum(is.na(virusDGE$counts))
> #[1] 0
> 
> 
>> sessionInfo()
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
> [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
> [7] LC_PAPER=C                 LC_NAME=C                 
> [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] edgeR_2.4.6           limma_3.10.3          GenomicFeatures_1.6.9
> [4] AnnotationDbi_1.16.19 Biobase_2.14.0        GenomicRanges_1.6.7  
> [7] IRanges_1.12.6       
> 
> loaded via a namespace (and not attached):
> [1] biomaRt_2.10.0     Biostrings_2.22.0  BSgenome_1.22.0    DBI_0.2-5         
> [5] RCurl_1.91-1       RSQLite_0.11.1     rtracklayer_1.14.4 tools_2.14.1      
> [9] XML_3.9-4          zlibbioc_1.0.1    
> 
> 
> 
> I am using a custom built annotation, i.e.
> virustxdb=makeTranscriptDb(transcripts, splicings, genes, chrominfo)
> It seems to have worked fine so far and counted reads per feature reliably, but could this be the problem ?
> 
> 
> Thanks for your time,
> 
> Colin Davenport
> 
> 
> Dr. Colin Davenport
> Bioinformatician
> Tümmler Group
> PFZ S0-7440
> Hannover Medical School
> Germany
> davenport [dot] colin <at> mh-hannover.de
> 0049 511532-8733
> 
> Genomics software available at
> http://genomics1.mh-hannover.de
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin

----------
http://www.fgcz.ch/Bioconductor2012
http://www.eccb12.org/t5