[BioC] segfault ReadAffy cause 'memory not mapped'

Thu Aug 22 14:53:55 CEST 2013

Dear Brian,

As I have already mentioned in the former case, package xps is able to 
handle this amount of arrays.
(Quite some time ago a user did use xps to process about 23,000 
hgu133plus2 arrays on his Mac, and memory consumption was only 4 GB RAM.)

Best regards,
Christian

On 8/22/13 12:18 AM, Brian D. Peyser PhD wrote:
> On 8/1/13 5:33 PM, Loraine, Ann wrote:
>> Hello,
>>
>> I am trying to process several thousand CEL files using the ReadAffy command.
>>
>> The machine has 96 Gb RAM.
>>
>> However I get this error:
>>
>>> expr=ReadAffy(filenames=d.uniq$cel,celfile.path='CEL',sampleNames=d.uniq$gsm,compress=T)
>>
>>   *** caught segfault ***
>> address 0x7fc79b4b1048, cause 'memory not mapped'
>>
>
> I also have a problem loading many (3750) Affy hgu133plus2 arrays into
> an AffyBatch. I was able to run this with ~2900 arrays, but not since
> adding ~800 more. At right around 16 GiB allocated, I get a segfault
> like:
>
>   *** caught segfault ***
> address 0x2aa6b6067048, cause 'memory not mapped'
>
> Traceback:
>   1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra,     ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio")
>   2: read.affybatch(filenames = as.character(pdata$Filename))
>
> I noticed this when trying to run justGCRMA() or justRMA(), which both
> threw the same error. The traceback pointed to read.affybatch() so I
> tried just doing that directly.
>
> I first checked to make sure each file could be read in a loop, and they
> all come in OK individually. However, if I try to read them all at once
> I keep getting errors right around 16 GiB allocated (to R).
>
> My laptop is Ubuntu Linux 12.04 with 32 GiB RAM, and I also tried this
> on a 256 GiB RAM machine with RHEL5. Both were running R version 3.0.1.
> On the Ubuntu machine, I was using affy v1.39.2, and on the RHEL5
> machine it was affy v1.38.1.
>
> In both cases the segfault came at about 16 GiB allocated (PBS epilogue
> shows 15.41 GiB memory used when running on the 256 GiB machine via
> batch submission). I also ran via an interactive PBS session on the 256
> GiB server and the same error happened.
>
> I had considered it could be a limit of the signed int indices for R
> vectors/arrays, but I thought that had changed as of R v3.0. Also, I
> thought that would give the error 'too many elements specified' rather
> than a 'memory not mapped' segfault. I've certainly allocated close to
> 64 GiB to R doing other things with these data, I'm just not sure if any
> individual vectors were that large.
>
> I know there are ways to get around this. For example, I ran fRMA on
> subsets (split it into 8 subsets) and then combined the expression sets.
> Of course trying to run fRMA on the whole set at once failed as well.
> The fRMA-summarized data just 'feel' a bit different though, and I've
> been working with many of these arrays for a while now. (I know
> 'feelings' aren't statistics, so please don't scorch me on that!) Also,
> I've seen the suggestions like aroma.* for large datasets.
>
> However, this seems like something that should be possible using the
> affy package given how cheap large memory systems are these days. I'm
> expecting a 0.5 TiB RAM workstation this fall! Also, if there is some
> kind of limitation in the implementation I think it's worth finding and
> helping get fixed. Any thoughts on whether there is a limitation in the
> affy package, in my gcc compiler, or something else? Would love for this
> to be able to use all my RAM.
>
> Below I included R output from one of my attempts.
>
> Thanks!
>
> Brian Peyser
>
>
> $ R --vanilla
> R version 3.0.1 (2013-05-16) -- "Good Sport"
> Copyright (C) 2013 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>    Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> library(affy)
> Loading required package: BiocGenerics
> Loading required package: parallel
>
> Attaching package: ‘BiocGenerics’
>
> The following objects are masked from ‘package:parallel’:
>
>      clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
>      clusterExport, clusterMap, parApply, parCapply, parLapply,
>      parLapplyLB, parRapply, parSapply, parSapplyLB
>
> The following object is masked from ‘package:stats’:
>
>      xtabs
>
> The following objects are masked from ‘package:base’:
>
>      anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
>      duplicated, eval, Filter, Find, get, intersect, lapply, Map,
>      mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
>      Position, rank, rbind, Reduce, rep.int, rownames, sapply, setdiff,
>      sort, table, tapply, union, unique, unlist
>
> Loading required package: Biobase
> Welcome to Bioconductor
>
>      Vignettes contain introductory material; view with
>      'browseVignettes()'. To cite Bioconductor, see
>      'citation("Biobase")', and for packages 'citation("pkgname")'.
>
>> data <- read.affybatch(filenames=list.files(pattern=".CEL$", ignore.case=TRUE))
>   *** caught segfault ***
> address 0x7f60734e7048, cause 'memory not mapped'
>
> Traceback:
>   1: .Call("read_abatch", filenames, rm.mask, rm.outliers, rm.extra,     ref.cdfName, dim.intensity[c(1, 2)], verbose, PACKAGE = "affyio")
>   2: read.affybatch(filenames = as.character(pdata$Filename))
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection:
>