[BioC] strange results with edgeR::goodTuring

Francois Pepin francois.pepin at sequentainc.com
Mon Aug 27 21:00:19 CEST 2012


Hi everyone,

I'm trying to use the goodTuring function in edgeR to estimate what kind of pseudocounts to use and I'm getting strange results with small number of categories:

x<-c(312,14491,16401,65124,129797,323321,366051,368599,405261,604962)
y<- goodTuring(x)
y
$count
 [1]    312  14491  16401  65124 129797 323321 366051 368599 405261 604962

$proportion
 [1] 0 0 0 0 0 0 0 0 0 1

$P0
[1] 0

$n0
[1] 0


If I'm understanding this properly, y$proportion is telling me that I should expect all my counts to fall under the last category, which does not make sense. I would expect something pretty close to x/sum(x) instead.

This is a bit of a toy example and I'm mostly interested in cases where I have more categories but it would be nice if this could work in all cases.

sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_2.6.9   limma_3.12.1  dataframe_2.5


Thanks,

François



More information about the Bioconductor mailing list