[BioC] featureCounts segfault

Ryan C. Thompson rct at thompsonclan.org
Fri May 9 22:13:19 CEST 2014


Hi Wei,

I've been testing out the new arguments for featureCounts on my use 
case, and I'm getting a segfault when I try to run it. I suspect that I 
may be using a too large annotation and featureCounts is running out of 
memory. Specifically, I'm counting ChIP-Seq reads in bins placed every 
50 bp across the entire human genome, so that's about 60 million bins, 
which is a lot. Below is the log output for the crash. Note that 
"featureCounts.fragments is my function that simply calls featureCounts 
with the appropriate arguments. In any case, the crash seems to happen 
while loading the annotation file.

Should I try splitting the annotation into smaller pieces, calling 
featureCounts on each one, and then rbinding them together?

Also, an unrelated question: when the read is reduced to its 5-prime 
end, is this reducing its length to one or zero? I ask because I want to 
know whether to extend by the fragment length or the fragment length 
minus 1.

Thanks,

-Ryan

Error output:

         ==========     _____ _    _ ____  _____ ______          _____
         =====         / ____| |  | |  _ \|  __ \|  ____| /\   |  __ \
           =====      | (___ | |  | | |_) | |__) | |__ /  \  | |  | |
             ====      \___ \| |  | |  _ <|  _  /|  __| / /\ \ | |  | |
               ====    ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
         ==========   |_____/ \____/|____/|_| \_\______/_/    \_\_____/
        Rsubread 1.15.2

//========================== featureCounts setting 
===========================\\
|| ||
||             Input files : 142 BAM 
files                                    ||
||                           S 
/gpfs/home/rcthomps/Projects/sarah-cd4/bam ... ||
...
||                           S 
/gpfs/home/rcthomps/Projects/sarah-cd4/bam ... ||
|| ||
||             Output file : 
./.Rsubread_featureCounts_pid17803               ||
||             Annotations : 
./.Rsubread_UserProvidedAnnotation_pid17803  ... ||
|| ||
||                 Threads : 
8                                                ||
||                   Level : meta-feature 
level                               ||
||              Paired-end : 
no                                               ||
||         Strand specific : 
no                                               ||
||      Multimapping reads : not 
counted                                      ||
|| Multi-overlapping reads : 
counted                                          ||
||         Read extensions : 0 on 5' and 147 on 3' 
ends                       ||
||       Read reduction to : 5' 
end                                           ||
|| ||
\\===================== http://subread.sourceforge.net/ 
======================//

//================================= Running 
==================================\\
|| ||
|| Load annotation file ./.Rsubread_UserProvidedAnnotation_pid17803 
...       ||

  *** caught segfault ***
address 0x7f428c4fd38d, cause 'memory not mapped'

Traceback:
  1: .C("R_readSummary_wrapper", as.integer(n), as.character(cmd),     
PACKAGE = "Rsubread")
  2: featureCounts(bam, annot.ext = saf, isPairedEnd = FALSE, read2pos = 
5,     readExtension3 = fraglength - 1, allowMultiOverlap = TRUE,     
strandSpecific = 0, nthreads = nthreads, ...)
  3: featureCounts.fragments(bamfiles, windows, fraglength = histone.width)

 > sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
  [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] grDevices datasets  parallel  graphics  utils stats     methods
[8] base

other attached packages:
  [1] BSgenome.Hsapiens.UCSC.hg19_1.3.19 BSgenome_1.30.0
  [3] BiocParallel_0.4.1 xlsx_0.5.5
  [5] xlsxjars_0.6.0 rJava_0.9-6
  [7] plyr_1.8 doParallel_1.0.8
  [9] iterators_1.0.7 foreach_1.4.2
[11] Rsubread_1.15.2 Rsamtools_1.14.3
[13] Biostrings_2.30.1 GenomicRanges_1.14.4
[15] XVector_0.2.0 BiocInstaller_1.12.1
[17] stringr_0.6.2 IRanges_1.20.6
[19] BiocGenerics_0.8.0 R.utils_1.29.8
[21] R.oo_1.17.0 R.methodsS3_1.6.1
[23] ggplot2_0.9.3.1

loaded via a namespace (and not attached):
  [1] BatchJobs_1.2      BBmisc_1.5 bitops_1.0-6       brew_1.0-6
  [5] codetools_0.2-8    colorspace_1.2-4 DBI_0.2-7          
dichromat_2.0-0
  [9] digest_0.6.4       fail_1.2 grid_3.0.2         gtable_0.1.2
[13] labeling_0.2       MASS_7.3-30 munsell_0.4.2      proto_0.3-10
[17] RColorBrewer_1.0-5 reshape2_1.2.2 RSQLite_0.11.4     scales_0.2.3
[21] sendmailR_1.1-2    stats4_3.0.2 tools_3.0.2        zlibbioc_1.8.0



More information about the Bioconductor mailing list