[BioC] edgeR: handling missing values with Quantile normalisation

Paul Leo p.leo at uq.edu.au
Wed Aug 31 03:07:47 CEST 2011


HI Sonika
It is probably not zero's that are causing the problem but NAs,

Check through the counts array
to see if it contains  NA's ... someting like..

apply(d$counts,2,function(x) sum(is.na(x)))

should get back all zeros....

probably setting them to 0 is appropriate.


Cheers
Paul



-----Original Message-----
From: Sonika Tyagi <Sonika.Tyagi at agrf.org.au>
To: 'bioconductor at r-project.org' <bioconductor at r-project.org>
Subject: [BioC] edgeR: handling missing values with Quantile
normalisation
Date: Wed, 31 Aug 2011 10:02:26 +1000

Hi there,

I am analysing RNAseq counts using edgeR package. But I am running into problems because of 'zero' counts for certain tags in my data.

The code syntax I am using is here:

> targets <- read.delim(file = "Targets.txt", stringsAsFactors = FALSE)
> targets
                                  files   group description
1  Sample_xx_count.txt.raw control   something
2  Sample_xx_count.txt.raw control   something
3  Sample_xx_count.txt.raw  Hi_Pos   something
4  Sample_xx_count.txt.raw  Hi_Pos   something
5  Sample_xx_count.txt.raw control   something
6  Sample_xx_count.txt.raw control   something
7   ................

d <- readDGE(targets, skip = 0, comment.char = "#")
d

An object of class "DGEList"
$samples
                                 files   group description  lib.size norm.factors
1 Sample_xx_count.txt.raw control   something 498180513            1
2 Sample_xx_count.txt.raw control   something 483775405            1
3 Sample_xx_count.txt.raw  Hi_Pos   something 368609647            1
4 Sample_xx_count.txt.raw  Hi_Pos   something 617334315            1
5 Sample_xx_count.txt.raw control   something 678060765            1
13 more rows ...

$counts
                       1     2     3     4     5     6      7     8     9    10     11    12    13    14     15 16    17    18
Tag1   15923 20323 14867 23098 32484 17223  51579 29578 17408 24097  34470 31964 17583 17583  39460  0 30359 25416
Tag2        700   600   200   695   500  1300   1425  1775   700  1974   1300  2371   900   900   1689  0   898  1690
Tag3      0     0   100     0     0     0      0     0     0     0      0     0     0     0    100  0   100     0
Tag4     74008 58753 51648 65233 93828 71047 117340 90551 55000 70124 121393 86106 46197 46197 127290  0 98369 79673
Tag5     19868 19385 25500 31215 56684 24096  51265 37492 27420 24496  32729 24722 24913 24913  50448  0 39755 55829
21887 more rows ...


 d <- calcNormFactors(d)
Error in quantile.default(x, p = q) :
  missing values and NaN's not allowed if 'na.rm' is FALSE

Could someone please suggest how to handle the missing values with edgeR normalisation methods ?

Thank you
Sonika
-------------------

> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_2.0.5  svIDE_0.9-50

loaded via a namespace (and not attached):
[1] limma_3.6.9   svMisc_0.9-61 tcltk_2.12.2  tools_2.12.2  XML_3.2-0.2

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list