[R] Problem with mclapply -- losing output/data

Elizabeth Purdom epurdom at stat.berkeley.edu
Tue Mar 22 17:44:37 CET 2011


Hello,

I forgot to mention that I am looping over ~70K objects. If I do 
mclapply on the first 200, its fine (i.e. doesn't give NULL values); if 
I go up to 2K (or over all of them), then I start to see NULL values.

Also the function I call uses commands 'restrict', 'gaps' and 'width' 
from the package IRanges in bioconductor in my functions. I don't know 
what is under the hood with those functions in terms of what calls they 
make, but could that be a source of a problem? (I saw an earlier post 
regarding errors when a function used Java code, but I'm not getting an 
error like they did)

Thanks,
Elizabeth

On 3/22/11 1:13 AM, Elizabeth Purdom wrote:
> Hello,
> I am running large simulations, which unfortunately I can't really 
> replicate here because the code is so extensive. I rely heavily on 
> mclapply, but I realize that I'm losing data somewhere.
>
> There are two worrisome symptoms:
> 1) I am getting 'NULL' as a return value for some (but not all) 
> elements of the output when I use mclapply, but not if I use lapply
> > tmp2[1:3] #output from lapply
> [[1]]
> 10000076 10000077
>       24       24
>
> [[2]]
> 10000076 10000077
>      119      119
>
> [[3]]
> 10000076
>       71
>
> > tmp[1:3] #output from mclapply
> [[1]]
> NULL
>
> [[2]]
> NULL
>
> [[3]]
> NULL
>
>
> 2) I am not getting back a list the same length as my input vector I'm 
> parallelizing over. i.e. a command like this:
>
> tmp<-mclapply(x, FUN=myfunc, mc.cores=16)
>
> gives me back a list tmp which is not the same length as x (and so I'm 
> getting all kinds of errors)
>
> This is extremely discouraging, because I've been using mclapply 
> extensively at very many points on simulations that take a very long 
> time to run, and now I'm wondering if what I'm getting is trustworthy. 
> I don't think I could reasonably finish my results without mclapply, 
> but I am thinking to cut it out except where it was absolutely 
> necessary, time-wise. If anyone had any suggestions as to why this 
> might be happening and how I can circumvent it (or test for it 
> happening), I would greatly appreciate it.
>
> Thanks,
> Elizabeth Purdom
>
> > sessionInfo()
> R version 2.12.1 (2010-12-16)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               
> LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8    
> LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             
> LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] multicore_0.1-4       msm_1.0               gtools_2.6.2          
> graph_1.28.0          Rsamtools_1.2.3
> [6] Biostrings_2.18.2     GenomicFeatures_1.2.3 GenomicRanges_1.2.3   
> IRanges_1.8.9
>
> loaded via a namespace (and not attached):
>  [1] Biobase_2.10.0     biomaRt_2.6.0      BSgenome_1.18.3    
> DBI_0.2-5          mvtnorm_0.9-96     RCurl_1.5-0
>  [7] RSQLite_0.9-4      rtracklayer_1.10.6 splines_2.12.1     
> survival_2.36-2    tools_2.12.1       XML_3.2-0
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list