[BioC] segfault ReadAffy cause 'memory not mapped'

Thu Aug 22 01:21:39 CEST 2013

Well, internally  read_abatch is using allocMatrix() to actually allocate
the main block of memory that will be used to store the probe intensities.
 However, there are a lot of places where "int" is used as the indexing
variable. Probably if I had had better foresight when I wrote this code 10
years ago, I'd have used something a bit more specific (eg int64_t). I'm
guessing it is one of these sorts of things that is causing the crash.
I'll try to get around to refactoring the code at some point.

If you'd like you could send me the gdb backtrace at the point of the
segfault and I could investigate further.

Best,

Ben

> On 8/1/13 5:33 PM, Loraine, Ann wrote:
>> Hello,
>>
>> I am trying to process several thousand CEL files using the ReadAffy
>> command.
>>
>> The machine has 96 Gb RAM.
>>
>> However I get this error:
>>
>> > expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames=d.uniq$gsm,compress=T)
>>
>>  *** caught segfault ***
>> address 0x7fc79b4b1048, cause 'memory not mapped'
>>
>
> I also have a problem loading many (3750) Affy hgu133plus2 arrays into
> an AffyBatch. I was able to run this with ~2900 arrays, but not since
> adding ~800 more. At right around 16 GiB allocated, I get a segfault
> like:
>
>  *** caught segfault ***
> address 0x2aa6b6067048, cause 'memory not mapped'
>
> Traceback:
>  1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra,
> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio")
>  2: read.affybatch(filenames = as.character(pdata$Filename))
>
> I noticed this when trying to run justGCRMA() or justRMA(), which both
> threw the same error. The traceback pointed to read.affybatch() so I
> tried just doing that directly.
>
> I first checked to make sure each file could be read in a loop, and they
> all come in OK individually. However, if I try to read them all at once
> I keep getting errors right around 16 GiB allocated (to R).
>
> My laptop is Ubuntu Linux 12.04 with 32 GiB RAM, and I also tried this
> on a 256 GiB RAM machine with RHEL5. Both were running R version 3.0.1.
> On the Ubuntu machine, I was using affy v1.39.2, and on the RHEL5
> machine it was affy v1.38.1.
>
> In both cases the segfault came at about 16 GiB allocated (PBS epilogue
> shows 15.41 GiB memory used when running on the 256 GiB machine via
> batch submission). I also ran via an interactive PBS session on the 256
> GiB server and the same error happened.
>
> I had considered it could be a limit of the signed int indices for R
> vectors/arrays, but I thought that had changed as of R v3.0. Also, I
> thought that would give the error 'too many elements specified' rather
> than a 'memory not mapped' segfault. I've certainly allocated close to
> 64 GiB to R doing other things with these data, I'm just not sure if any
> individual vectors were that large.
>
> I know there are ways to get around this. For example, I ran fRMA on
> subsets (split it into 8 subsets) and then combined the expression sets.
> Of course trying to run fRMA on the whole set at once failed as well.
> The fRMA-summarized data just 'feel' a bit different though, and I've
> been working with many of these arrays for a while now. (I know
> 'feelings' aren't statistics, so please don't scorch me on that!) Also,
> I've seen the suggestions like aroma.* for large datasets.
>
> However, this seems like something that should be possible using the
> affy package given how cheap large memory systems are these days. I'm
> expecting a 0.5 TiB RAM workstation this fall! Also, if there is some
> kind of limitation in the implementation I think it's worth finding and
> helping get fixed. Any thoughts on whether there is a limitation in the
> affy package, in my gcc compiler, or something else? Would love for this
> to be able to use all my RAM.
>
> Below I included R output from one of my attempts.
>
> Thanks!
>
> Brian Peyser
>
>
> $ R --vanilla
> R version 3.0.1 (2013-05-16) -- "Good Sport"
> Copyright (C) 2013 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>   Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> library(affy)
> Loading required package: BiocGenerics
> Loading required package: parallel
>
> Attaching package: â€˜BiocGenericsâ€™
>
> The following objects are masked from â€˜package:parallelâ€™:
>
>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>     clusterExport, clusterMap, parApply, parCapply, parLapply,
>     parLapplyLB, parRapply, parSapply, parSapplyLB
>
> The following object is masked from â€˜package:statsâ€™:
>
>     xtabs
>
> The following objects are masked from â€˜package:baseâ€™:
>
>     anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
>     duplicated, eval, Filter, Find, get, intersect, lapply, Map,
>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
>     Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff,
>     sort, table, tapply, union, unique, unlist
>
> Loading required package: Biobase
> Welcome to Bioconductor
>
>     Vignettes contain introductory material; view with
>     'browseVignettes()'. To cite Bioconductor, see
>     'citation("Biobase")', and for packages 'citation("pkgname")'.
>
>> data <- read.affybatch(filenames=list.files(pattern=".CEL$",
>> ignore.case=TRUE))
>  *** caught segfault ***
> address 0x7f60734e7048, cause 'memory not mapped'
>
> Traceback:
>  1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra,
> ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio")
>  2: read.affybatch(filenames = as.character(pdata$Filename))
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection:
>
> --
> Brian D. Peyser PhD
> Special Assistant to the Associate Director
> Office of the Associate Director
> Developmental Therapeutics Program
> Division of Cancer Treatment and Diagnosis
> National Cancer Institute
> National Institutes of Health
> 301-524-5587 (mobile)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor