[BioC] DESeq2 high fold change values for comparison of samples with zero counts

Noa Henig ntzunz at gmail.com
Sun Feb 9 16:25:45 CET 2014


Hi DESeq developers,
I am running DESeq2 1.2.8 to receive differential expression in 2 different
cell types. For each cell type I have control samples, over-expression of
gene A, and over-expression of gene B, while the second cell type is used
in order to support the findings from the first cell type.

I have 2 questions:
First, I found some genes that had zero normalized counts in 7 out of 8
replicates (the last one had 1 count or 1.6 counts). The fold change for
comparison for such two genes was -13.4 and 2.47. Since the 'condition' has
more than 3 levels I set the betaPrior to False as Michael indicated in the
earlier correspondence (the code appears below).
Can you tell the reason for this this and how to correct this?

In addition, I received the following warning:
Warning message:
In parametricDispersionFit(mcols(objectNZ)$baseMean[useForFit],  :
  the parametric dispersion fit did not converge,
which occurs when the dispersion estimates over the mean do not fit to the
curve y = a/x + b. You should re-run the analysis using fitType='local'
or 'mean' (DESeq2 > v1.2 will re-run automatically).
When using local regression fit, the user should examine plotDispEsts(dds)
to make sure the fitted line is not sharply curving up or down based on
the position of individual points.

I guess that by running version 1.2.8 the additional fit types were ran
automatically but can I see the actual code that was ran ?

This is the sessionInfo:
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Hebrew_Israel.1255  LC_CTYPE=Hebrew_Israel.1255
 LC_MONETARY=Hebrew_Israel.1255
[4] LC_NUMERIC=C                   LC_TIME=Hebrew_Israel.1255

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
[1] DESeq2_1.2.8          RcppArmadillo_0.4.000 Rcpp_0.10.6
GenomicRanges_1.14.4  XVector_0.2.0
[6] IRanges_1.20.6        BiocGenerics_0.8.0

loaded via a namespace (and not attached):
 [1] annotate_1.40.0      AnnotationDbi_1.24.0 Biobase_2.22.0
DBI_0.2-7            genefilter_1.44.0
 [6] grid_3.0.2           lattice_0.20-23      locfit_1.5-9.1
RColorBrewer_1.0-5   RSQLite_0.11.4
[11] splines_3.0.2        stats4_3.0.2         survival_2.37-4
 XML_3.98-1.1         xtable_1.7-1

And this is the code I was running:

tmpcountData = read.table(countsTable, header=TRUE, row.names=1)

numOfConds=6

samplesList = c("U_V0", "U_V0","U_V0", "U_V0", "U_V0", "U_V0", "U_K1",
"U_K1", "U_K1", "U_K1", "U_p5", "U_p5", "M_V0", "M_V0", "M_V0",
"M_K1","M_K1", "M_p5", "M_p5")

countData = tmpcountData[,3:21] # counts table contains 2 more columns
before the counts

colData = data.frame(row.names=colnames(countData), condition = samplesList)

conditionsList = c("U_V0","U_K1", "U_p5","M_V0","M_K1","M_p5")

colData$condition = factor( colData$condition, levels =
conditionsList)

dds <- DESeqDataSetFromMatrix(countData = countData, colData = colData,
design = ~condition)

dds <- estimateSizeFactors(dds)

dds <- estimateDispersions(dds, maxit = 100)

dds <- nbinomWaldTest(dds, betaPrior = FALSE)

png("plotDispEsts_final.png")

plotDispEsts(dds)

dev.off()

tmp_counts = counts(dds, normalized=TRUE)

normalized_counts = data.frame(tmp_counts)


... reading the columns of counts for comparisons  and writing the results to
a table....


Thanks!
Noa Henig





On Tue, Jan 7, 2014 at 4:21 PM, Michael Love <michaelisaiahlove at gmail.com>wrote:

> Hi Noa,
>
> Thanks for reporting in. This sounds like the same problem that a user
> had in November:
>
>
> http://permalink.gmane.org/gmane.science.biology.informatics.conductor/51749
> 
>
> If you are using the updated release version (>= 1.2.6), there should
> have been a warning printed when you build the DESeqDataSet that you should
> set betaPrior=FALSE because factors are present in the design with 3 or
> more levels.
>
> In the devel branch of DESeq2 (v1.3), we have implemented a solution that
> allows using a prior on log fold changes for factors with 3 or more levels
> results. Then DESeq() or nbinomWaldTest() will provide symmetric results
> regardless of the base level and it takes care of this situation with
> nonzero log fold changes from contrasts in which both conditions have zeros.
>
> With future questions please
>  post your full code, the output of sessionInfo()
> , as this helps us provide better answers.
>
> Mike
> On Jan 7, 2014 8:49 AM, "Noa Henig" <ntzunz at gmail.com> wrote:
>
>> Hi,
>> I am using DESeq2 to get differential expression of 45 samples which
>> represent 26 different conditions.
>> I read that the calculation of the fold change in changed DESeq2 and
>> affects the genes having low counts more than the highly expressed genes.
>> However, I found genes that had zero counts in series of  4 samples (2
>> replicates in each of 2 conditions), while their log2fold change was 3.
>> what is the reason for that? or how can this be prevented?
>>
>> I used the nbinomWaldTest and did not replace outliers with the trimmed
>> mean since I have only 2 replicates per condition.
>>
>> I'd appreciate your help very much,
>> Noa Henig
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plotDispEsts_default.png
Type: image/png
Size: 15011 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140209/e42fa806/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plotDispEsts_mean.png
Type: image/png
Size: 14656 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140209/e42fa806/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plotDispEsts_local.png
Type: image/png
Size: 14842 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20140209/e42fa806/attachment-0002.png>


More information about the Bioconductor mailing list