[BioC] DESeq2 Error in estimateSizeFactorsForMatrix
Michael Love
michaelisaiahlove at gmail.com
Fri Jun 6 16:26:14 CEST 2014
hi Caleb,
(I thought I already had an internal check for a user supplying a
matrix of all zeros, to provide a more helpful error message, but I
guess I'll have to include it in the next version)
On Fri, Jun 6, 2014 at 10:17 AM, Caleb Bostwick <bostwick at ufl.edu> wrote:
> Hello Michael. I generated the rowSums count and got:
>
>> table(rs)
> rs
> 4
> 18820
>
> I think I have 18820 rows (genes) in my data and 4 samples, so as suggested
> by the original error, it appears all my genes have 0's for their counts. I
> displayed my ddsFull object data below for reference:
>
You can inspect the counts of a dds with:
head(counts(dds))
Or of a SummarizedExperiment with:
head(assay(se))
>> ddsFull
> class: DESeqDataSet
> dim: 18820 4
> exptData(0):
> assays(1): counts
> rownames(18820): gene0 gene1 ... gene9998 gene9999
> rowData metadata column names(0):
> colnames(4): LE005merged.bam LE006merged.bam LE007merged.bam LE008merged.bam
> colData names(2): run pheno
>
>
> So I suppose the next question is how do I find and fix the problem of
> getting 0's for all my gene counts?
>
>
Perhaps you are counting reads using a different annotation for
chromosome names as the genome to which the reads were aligned, i.e.
'chr1' vs '1'. In the beginner vignette we use a GTF file. So on the
command line you could look at the chromosome names in the GTF file
compared to the reads in the BAM file.
Mike
>
> I am not sure if I need to supply my session info again, but I will below.
> Thanks for your help.
>
> Best,
> Caleb
>
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> base
>
> other attached packages:
> [1] DESeq2_1.4.5 RcppArmadillo_0.4.300.0 Rcpp_0.11.1
> [4] GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0
> [7] Biostrings_2.32.0 XVector_0.4.0 GenomicFeatures_1.16.1
> [10] AnnotationDbi_1.26.0 Biobase_2.24.0 GenomicRanges_1.16.3
> [13] GenomeInfoDb_1.0.2 IRanges_1.22.7 BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.42.0 BatchJobs_1.2 BBmisc_1.6
> BiocParallel_0.6.1
> [5] biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6
> codetools_0.2-8
> [9] DBI_0.2-7 digest_0.6.4 fail_1.2 foreach_1.4.2
> [13] genefilter_1.46.1 geneplotter_1.42.0 grid_3.1.0
> iterators_1.0.7
> [17] lattice_0.20-29 locfit_1.5-9.1 plyr_1.8.1
> RColorBrewer_1.0-5
> [21] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.24.1
> sendmailR_1.1-2
> [25] splines_3.1.0 stats4_3.1.0 stringr_0.6.2
> survival_2.37-7
> [29] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3
> zlibbioc_1.10.0
>
>
> On Tue, Jun 3, 2014 at 10:41 AM, Michael Love <michaelisaiahlove at gmail.com>
> wrote:
>>
>> hi Caleb,
>>
>> You can do this:
>>
>> rs <- rowSums( counts(ddsFull) == 0 )
>> table(rs)
>>
>> This will tabulate the number of genes with 0, 1, 2, ... samples with a
>> zero.
>>
>> Mike
>>
>>
>>
>> On Tue, Jun 3, 2014 at 10:31 AM, Caleb Bostwick <bostwick at ufl.edu> wrote:
>>>
>>> Thank you for your quick reply Simon. I am inexperienced with R and do
>>> not know how to check if all my genes contain a zero in at least one of the
>>> samples. I'm sorry for my ignorance, but if someone would explain or point
>>> me to an explanation of how I could check this, I will certainly check this.
>>> However, I have used the commercial software CLC Bio to map each sample's
>>> reads to our reference genome and seen the data displayed in an Excel table
>>> and I know that at least some of the genes do not contain a zero in every
>>> sample. If this is the case in my "raw" DESeq2 data, then I made an error
>>> somewhere, but I am following the "beginner" tutorial very carefully and
>>> don't know where I have gone wrong. Thank you for your help.
>>>
>>>
>>> Best,
>>> Caleb
>>>
>>>
>>> ---------
>>>
>>> Dear Caleb
>>>
>>> On 30/05/14 16:59, Caleb Bostwick [guest] wrote:
>>> > Could you please help me or direct me to a source where I might find
>>> > a solution to my error problem? A "Google search" on the error did
>>> > not return useful results. Thank you very much.
>>>
>>> For starters, check whether the claim of the error message is actualy
>>> true:
>>> > Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans
>>> > = geoMeans) :
>>> > every gene contains at least one zero, cannot compute log geometric
>>> > means
>>>
>>> Does every gene contain a zero in at least one of the samples? If so,
>>> how comes?
>>>
>>> Simon
>>>
>>>
>>>
>>>
>>> On Fri, May 30, 2014 at 11:02 AM, Caleb Bostwick <bostwick at ufl.edu>
>>> wrote:
>>>>
>>>> Hello Michael. I am a neuroscience researcher attempting to use the
>>>> DESeq2 package to perform differential expression analysis of my sequencing
>>>> data. I am following the beginner vignette but substituting my own data into
>>>> the code. I managed to get to the part where it tells me to call the DESeq
>>>> function, but I received the following error:
>>>>
>>>> > dds <- DESeq(ddsFull)
>>>> estimating size factors
>>>> Error in estimateSizeFactorsForMatrix(counts(object), locfunc, geoMeans
>>>> = geoMeans) :
>>>> every gene contains at least one zero, cannot compute log geometric
>>>> means
>>>>
>>>> -----
>>>>
>>>> My data was generated from FASTQ files from the sequencer, which I
>>>> quality/adapter trimmed, and then aligned to our reference genome using the
>>>> programs STAR and Bowtie2. The unmapped reads from the STAR program were
>>>> subsequently run through Bowtie2 and the SAM file outputs from both
>>>> alignment programs were combined using Picard-Tools MergeSAM. The merged SAM
>>>> files were then converted to BAM files and I began the DESeq2 beginner
>>>> tutorial.
>>>>
>>>> Could you please help me or direct me to a source where I might find a
>>>> solution to my error problem? A "Google search" on the error did not return
>>>> useful results. Thank you very much.
>>>>
>>>> Best,
>>>> Caleb Bostwick
>>>>
>>>>
>>>> The posting guide said I should include sessionInfo():
>>>>
>>>>
>>>> >sessionInfo()
>>>> R version 3.1.0 (2014-04-10)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>>>> States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United States.1252
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets methods
>>>> base
>>>>
>>>> other attached packages:
>>>> [1] DESeq2_1.4.5 RcppArmadillo_0.4.300.0 Rcpp_0.11.1
>>>> GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0
>>>> Biostrings_2.32.0 XVector_0.4.0
>>>> [9] GenomicFeatures_1.16.1 AnnotationDbi_1.26.0 Biobase_2.24.0
>>>> GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.7
>>>> BiocGenerics_0.10.0 BiocInstaller_1.14.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] annotate_1.42.0 BatchJobs_1.2 BBmisc_1.6
>>>> BiocParallel_0.6.1 biomaRt_2.20.0 bitops_1.0-6 brew_1.0-6
>>>> codetools_0.2-8 DBI_0.2-7 digest_0.6.4
>>>> [11] fail_1.2 foreach_1.4.2 genefilter_1.46.1
>>>> geneplotter_1.42.0 grid_3.1.0 iterators_1.0.7 lattice_0.20-29
>>>> locfit_1.5-9.1 plyr_1.8.1 RColorBrewer_1.0-5
>>>> [21] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.24.1
>>>> sendmailR_1.1-2 splines_3.1.0 stats4_3.1.0 stringr_0.6.2
>>>> survival_2.37-7 tools_3.1.0 XML_3.98-1.1
>>>> [31] xtable_1.7-3 zlibbioc_1.10.0
>>>
>>>
>>
>
More information about the Bioconductor
mailing list