[Rd] mclapply on a set not divisible by number of cores (PR#14205)

o.heil at dkfz.de o.heil at dkfz.de
Wed Feb 3 13:55:10 CET 2010


Full_Name: Oliver Heil
Version: 2.10.0
OS: debian squeeze
Submission from: (NULL) (193.174.58.251)


When running mclapply on a list of strings with a length of 618 on 10 cores the
resulting data is wrong every 10 entries starting with the 6th. Our machine has
16 cores.

You may reproduce the error using data provided here:
<http://www.dkfz.de/gpcf/tmp_535434fsfd/>

Together with the following code (R --vanilla): 

# foreach probeid(618 Probeids) get the data points from the 
#    dataframes control and group
# calculate mean, standard deviation and detection p value for group and
control
# calculate the p value, that mean of control and mean of group are different
# 
# The result is a list (length 618) of 7 tuples
# 
# Have a look at x_sd_p.test[[6]], x_sd_p.test[[16]], ...
# It works fine using lapply or doing the function "by 
#    hand" for example with factor=probeids[6]
#

load("df.control.R")
load("df.group.R")
load("negative_bead.R")
load("probeids.R")

library("multicore")

x_sd_p.test=mclapply(probeids,function(factor){
	idxg=which(df.group$Factor %in% factor);
	mg=NA;sdg=NA;pg=1.0;
	if(length(idxg)>0){
		lg=df.group$x[idxg];
		mg=mean(lg,,TRUE);
		sdg=sd(lg,TRUE);
		t=wilcox.test(lg,negative_bead,alternative="g",exact=TRUE);
		pg=t$p.value;
	}
	idxc=which(df.control$Factor %in% factor);
	mc=NA;sdc=NA;pc=1.0
	if(length(idxc)>0){
		lc=df.control$x[idxc];
		mc=mean(lc,,TRUE);
		sdc=sd(lc,TRUE);
		t=wilcox.test(lc,negative_bead,alternative="g",exact=TRUE);
		pc=t$p.value;
	}
	p=1.0;
	if(length(idxg)>0&&length(idxc)>0){
		t=wilcox.test(lg,lc,alternative="t",exact=TRUE);
		p=t$p.value;
	}
	c(mg,sdg,pg,mc,sdc,pc,p);
},mc.cores=10)

l=lapply(x_sd_p.test,function(x){length(x)})




> sessionInfo()
R version 2.10.0 (2009-10-26)
x86_64-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] multicore_0.1-3

loaded via a namespace (and not attached):
[1] tools_2.10.0

> version
               _
platform       x86_64-pc-linux-gnu
arch           x86_64
os             linux-gnu
system         x86_64, linux-gnu
status
major          2
minor          10.0
year           2009
month          10
day            26
svn rev        50208
language       R
version.string R version 2.10.0 (2009-10-26)



More information about the R-devel mailing list