[BioC] Problem running summarizeOverlaps()

Martin Morgan mtmorgan at fhcrc.org
Tue May 20 18:05:51 CEST 2014


On 05/20/2014 05:34 AM, Jessica Perry Hekman wrote:
> On 05/19/2014 09:32 PM, Martin Morgan wrote:
>> On 05/19/2014 06:55 PM, Jessica Perry Hekman wrote:
>>> I am working from
>>>
>>> http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/RNA-seqWorkflow.pdf
>>>
>
>>> gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
>>>           ignore.strand=TRUE, single.end=TRUE, param=param)
>
>> Hi Jessica --
>>
>> I think that summarizeOverlaps is trying to evaluate your counting
>> algorithm in on several different cores, but an error occurs. Try
>> running the commands above, and then immediately before
>> summarizeOverlaps evaluate
>>
>>    options(mc.cores=1)
>>    gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
>>        ignore.strand=TRUE, single.end=TRUE, param=param)
>>
>> Hopefully this will at least make the error apparent, even if it might
>> still be cryptic.
>>
>> Please be sure to include the output of the command 'sessionInfo()'
>> after you have a problem; here's mine
>
> Ah yes! Very helpful! The error message after I added mc.cores=1 to my script is:
>
> Error: C stack usage is too close to the limit
>
> ...which is indeed much less cryptic. I am still not sure how to fix the
> problem, though!

I haven't seen this error before in the context of summarizeOverlaps, so it's a 
bit puzzling. I'd first check that the

   fls <- list.files("../../bam/", pattern="fox-readgroups.bam$", full.names=T)

all point to valid bam files, and the bam files have indexes.

You might then try adding a 'yieldSize' argument to the following line, starting 
small (e.g., 100000) and moving toward the default (1000000) if the small size 
works when calling summarizeOverlaps, or perhaps smaller if it fails.

   bamfls <- BamFileList(fls, yieldSize=100000)

Can you provide a little information about your system? It sounds like it's your 
own machine, not a server. How much memory?

Probably you'd get a different outcome with a more recent R / Bioconductor, but 
I'm not sure whether the error would go away! I have a sense that the problem 
with package manager installation is that they or you end up installing 
non-default packages into a single system directory, and as a consequence the 
directory contains a mix of different Bioconductor releases. A 'better practice' 
is probably to

   a) remove any existing system-wide R installation and packages

   b) install R with only base packages as su, or (as I do) install R as a 
regular user (not su) in version-specific directories in your own user file 
system, e.g., ~mtmorgan/bin/R-3-1-branch/

   c) install any additional packages, via biocLite or otherwise, as a regular 
user, following R's prompt to create a version-specific directory in your own 
user hierarchy.

Obviously this can be a rats nest of problems, and should only be done 
immediately before a big deadline or when you are feeling too productive and 
need to scale back ;)

Martin

>
> sessionInfo() output:
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-redhat-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] leeBamViews_0.99.24
>   [2] BSgenome_1.30.0
>   [3] Rsamtools_1.14.3
>   [4] Biostrings_2.30.1
>   [5] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
>   [6] GenomicFeatures_1.14.5
>   [7] AnnotationDbi_1.24.0
>   [8] Biobase_2.22.0
>   [9] GenomicRanges_1.14.4
> [10] XVector_0.2.0
> [11] IRanges_1.20.7
> [12] BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.18.0     bitops_1.0-6       DBI_0.2-7 RCurl_1.95-4.1
>   [5] RSQLite_0.11.4     rtracklayer_1.22.7 stats4_3.0.2 tools_3.0.2
>   [9] XML_3.98-1.1       zlibbioc_1.8.0
>
> ...and I should have remembered that I am using an older version of R. What I am
> running is the latest version that my package manager has on offer. Last time I
> installed a more recent version separately from yum, it was a huge annoyance to
> keep the two separate versions on the system. Do you think updating R and
> Bioconductor (which appears to depend on the most recent R in order to upgrade)
> will help?
>
> Thanks very much,
> Jessica
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list