[BioC] DESeq2 Error in estimateSizeFactorsForMatrix

Michael Love michaelisaiahlove at gmail.com
Fri Jun 6 16:26:14 CEST 2014


hi Caleb,

(I thought I already had an internal check for a user supplying a
matrix of all zeros, to provide a more helpful error message, but I
guess I'll have to include it in the next version)

On Fri, Jun 6, 2014 at 10:17 AM, Caleb Bostwick <bostwick at ufl.edu> wrote:
> Hello Michael. I generated the rowSums count and got:
>
>> table(rs)
> rs
>     4
> 18820
>
> I think I have 18820 rows (genes) in my data and 4 samples, so as suggested
> by the original error, it appears all my genes have 0's for their counts. I
> displayed my ddsFull object data below for reference:
>

You can inspect the counts of a dds with:

head(counts(dds))

Or of a SummarizedExperiment with:

head(assay(se))

>> ddsFull
> class: DESeqDataSet
> dim: 18820 4
> exptData(0):
> assays(1): counts
> rownames(18820): gene0 gene1 ... gene9998 gene9999
> rowData metadata column names(0):
> colnames(4): LE005merged.bam LE006merged.bam LE007merged.bam LE008merged.bam
> colData names(2): run pheno
>
>
> So I suppose the next question is how do I find and fix the problem of
> getting 0's for all my gene counts?
>
>

Perhaps you are counting reads using a different annotation for
chromosome names as the genome to which the reads were aligned, i.e.
'chr1' vs '1'.  In the beginner vignette we use a GTF file. So on the
command line you could look at the chromosome names in the GTF file
compared to the reads in the BAM file.

Mike

>
> I am not sure if I need to supply my session info again, but I will below.
> Thanks for your help.
>
> Best,
> Caleb
>
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
>  [1] DESeq2_1.4.5            RcppArmadillo_0.4.300.0 Rcpp_0.11.1
>  [4] GenomicAlignments_1.0.1 BSgenome_1.32.0         Rsamtools_1.16.0
>  [7] Biostrings_2.32.0       XVector_0.4.0           GenomicFeatures_1.16.1
> [10] AnnotationDbi_1.26.0    Biobase_2.24.0          GenomicRanges_1.16.3
> [13] GenomeInfoDb_1.0.2      IRanges_1.22.7          BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
>  [1] annotate_1.42.0    BatchJobs_1.2      BBmisc_1.6
> BiocParallel_0.6.1
>  [5] biomaRt_2.20.0     bitops_1.0-6       brew_1.0-6
> codetools_0.2-8
>  [9] DBI_0.2-7          digest_0.6.4       fail_1.2           foreach_1.4.2
> [13] genefilter_1.46.1  geneplotter_1.42.0 grid_3.1.0
> iterators_1.0.7
> [17] lattice_0.20-29    locfit_1.5-9.1     plyr_1.8.1
> RColorBrewer_1.0-5
> [21] RCurl_1.95-4.1     RSQLite_0.11.4     rtracklayer_1.24.1
> sendmailR_1.1-2
> [25] splines_3.1.0      stats4_3.1.0       stringr_0.6.2
> survival_2.37-7
> [29] tools_3.1.0        XML_3.98-1.1       xtable_1.7-3
> zlibbioc_1.10.0
>
>
> On Tue, Jun 3, 2014 at 10:41 AM, Michael Love <michaelisaiahlove at gmail.com>
> wrote:
>>
>> hi Caleb,
>>
>> You can do this:
>>
>> rs <- rowSums( counts(ddsFull) == 0 )
>> table(rs)
>>
>> This will tabulate the number of genes with 0, 1, 2, ... samples with a
>> zero.
>>
>> Mike
>>
>>
>>
>> On Tue, Jun 3, 2014 at 10:31 AM, Caleb Bostwick <bostwick at ufl.edu> wrote:
>>>
>>> Thank you for your quick reply Simon. I am inexperienced with R and do
>>> not know how to check if all my genes contain a zero in at least one of the
>>> samples. I'm sorry for my ignorance, but if someone would explain or point
>>> me to an explanation of how I could check this, I will certainly check this.
>>> However, I have used the commercial software CLC Bio to map each sample's
>>> reads to our reference genome and seen the data displayed in an Excel table
>>> and I know that at least some of the genes do not contain a zero in every
>>> sample. If this is the case in my "raw" DESeq2 data, then I made an error
>>> somewhere, but I am following the "beginner" tutorial very carefully and
>>> don't know where I have gone wrong. Thank you for your help.
>>>
>>>
>>> Best,
>>> Caleb
>>>
>>>
>>> ---------
>>>
>>> Dear Caleb
>>>
>>> On 30/05/14 16:59, Caleb Bostwick [guest] wrote:
>>>  > Could you please help me or direct me to a source where I might find
>>>  > a solution to my error problem? A "Google search" on the error did
>>>  > not return useful results. Thank you very much.
>>>
>>> For starters, check whether the claim of the error message is actualy
>>> true:
>>> > Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans
>>> > = geoMeans) :
>>> >    every gene contains at least one zero, cannot compute log geometric
>>> > means
>>>
>>> Does every gene contain a zero in at least one of the samples? If so,
>>> how comes?
>>>
>>>    Simon
>>>
>>>
>>>
>>>
>>> On Fri, May 30, 2014 at 11:02 AM, Caleb Bostwick <bostwick at ufl.edu>
>>> wrote:
>>>>
>>>> Hello Michael. I am a neuroscience researcher attempting to use the
>>>> DESeq2 package to perform differential expression analysis of my sequencing
>>>> data. I am following the beginner vignette but substituting my own data into
>>>> the code. I managed to get to the part where it tells me to call the DESeq
>>>> function, but I received the following error:
>>>>
>>>> > dds <- DESeq(ddsFull)
>>>> estimating size factors
>>>> Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans
>>>> = geoMeans) :
>>>>   every gene contains at least one zero, cannot compute log geometric
>>>> means
>>>>
>>>> -----
>>>>
>>>> My data was generated from FASTQ files from the sequencer, which I
>>>> quality/adapter trimmed, and then aligned to our reference genome using the
>>>> programs STAR and Bowtie2. The unmapped reads from the STAR program were
>>>> subsequently run through Bowtie2 and the SAM file outputs from both
>>>> alignment programs were combined using Picard-Tools MergeSAM. The merged SAM
>>>> files were then converted to BAM files and I began the DESeq2 beginner
>>>> tutorial.
>>>>
>>>> Could you please help me or direct me to a source where I might find a
>>>> solution to my error problem? A "Google search" on the error did not return
>>>> useful results. Thank you very much.
>>>>
>>>> Best,
>>>> Caleb Bostwick
>>>>
>>>>
>>>> The posting guide said I should include sessionInfo():
>>>>
>>>>
>>>> >sessionInfo()
>>>> R version 3.1.0 (2014-04-10)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
>>>> States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>>> base
>>>>
>>>> other attached packages:
>>>>  [1] DESeq2_1.4.5            RcppArmadillo_0.4.300.0 Rcpp_0.11.1
>>>> GenomicAlignments_1.0.1 BSgenome_1.32.0         Rsamtools_1.16.0
>>>> Biostrings_2.32.0       XVector_0.4.0
>>>>  [9] GenomicFeatures_1.16.1  AnnotationDbi_1.26.0    Biobase_2.24.0
>>>> GenomicRanges_1.16.3    GenomeInfoDb_1.0.2      IRanges_1.22.7
>>>> BiocGenerics_0.10.0     BiocInstaller_1.14.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>>  [1] annotate_1.42.0    BatchJobs_1.2      BBmisc_1.6
>>>> BiocParallel_0.6.1 biomaRt_2.20.0     bitops_1.0-6       brew_1.0-6
>>>> codetools_0.2-8    DBI_0.2-7          digest_0.6.4
>>>> [11] fail_1.2           foreach_1.4.2      genefilter_1.46.1
>>>> geneplotter_1.42.0 grid_3.1.0         iterators_1.0.7    lattice_0.20-29
>>>> locfit_1.5-9.1     plyr_1.8.1         RColorBrewer_1.0-5
>>>> [21] RCurl_1.95-4.1     RSQLite_0.11.4     rtracklayer_1.24.1
>>>> sendmailR_1.1-2    splines_3.1.0      stats4_3.1.0       stringr_0.6.2
>>>> survival_2.37-7    tools_3.1.0        XML_3.98-1.1
>>>> [31] xtable_1.7-3       zlibbioc_1.10.0
>>>
>>>
>>
>



More information about the Bioconductor mailing list