[BioC] DESeq Normalization Question

Stephen Turner vustephen at gmail.com
Fri May 10 18:38:30 CEST 2013


Simon et al.,

I'm sure this issue has come up before, but I couldn't find an
appropriate thread or answer either here or SEQanswers.

What feature of the data or the distribution of counts among my
samples can cause the sizeFactors to vary much more than the raw
counts / library sizes?

More detail: I'm using DESeq to analyze RNA-seq data mapped with STAR,
counted with htseq-count. Comparing the "doubleTerm" samples to the
"wt" samples, there are many genes that appear downregulated. While
these samples were sequenced, on average, to a similar sequencing
depth, the normalization factors are much smaller for WT, resulting in
much larger normalized counts, resulting in more apparently
downregulated genes in doubleTerm vs WT.

> cds <- newCountDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory)
> cds <- estimateSizeFactors(cds)
> cds <- estimateDispersions(cds)
> data.frame(sizefactors=sizeFactors(cds), rawcounts=colSums(counts(cds, normalized=FALSE)))
                sizefactors rawcounts
S01_wt1           0.9016089  23466349
S02_wt2           0.7679168  22428603
S03_wt3           0.7952564  19841959
S04_wt4           0.7839629  18363384
S05_pten8w1       1.0301769  20859853
S06_pten8w2       0.9949514  16809588
S07_pten8w3       0.9425865  16731071
S08_pten22w1      1.0826846  18906329
S09_pten22w2      1.1640354  20164026
S10_pten22w3      1.0111748  17306468
S11_double8w1     0.7949001  17671986
S12_double8w2     1.4509978  23673557
S13_double8w3     1.1703853  22127841
S14_doubleterm2   1.0786455  19063010
S15_doubleterm4   1.1265935  19279814
S16_doubleterm6   1.3059472  22750403

Thank you.

Stephen

> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets
methods   base

other attached packages:
[1] DESeq_1.12.0         lattice_0.20-15      locfit_1.5-9
Biobase_2.20.0
[5] BiocGenerics_0.6.0   edgeR_3.2.3          limma_3.16.2
BiocInstaller_1.10.1

loaded via a namespace (and not attached):
 [1] annotate_1.38.0      AnnotationDbi_1.22.3 DBI_0.2-6
DESeq2_1.0.9
 [5] genefilter_1.42.0    geneplotter_1.38.0   GenomicRanges_1.12.2
grid_3.0.0
 [9] IRanges_1.18.0       RColorBrewer_1.0-5   RSQLite_0.11.3
splines_3.0.0
[13] stats4_3.0.0         survival_2.37-4      tools_3.0.0
XML_3.95-0.2
[17] xtable_1.7-1



More information about the Bioconductor mailing list