[R] getting started in parallel computing on a windows OS

Martin Morgan mtmorgan at fhcrc.org
Thu Apr 25 01:34:06 CEST 2013


On 04/24/2013 02:50 PM, Benjamin Caldwell wrote:
> Dear R help,
>
> I've what I think is a fairly simple parallel problem, and am getting
> bogged down in documentation and packages for much more complex situations.
>
> I have a big matrix  (30^5,5]. I have a function that will act on each row
> of that matrix sequentially and output the 'best' result from the whole
> matrix (it compares the result from each row to the last and keeps the
> 'better' result). I would like to divide that first large matrix into
> chunks equal to the number of cores I have available to me, and work
> through each chunk, then output the results from each chunk.
>
> I'm really having trouble making head or tail of how to do this on a
> windows machine - lots of different false starts on several different
> packages now. Basically, I have the function, and I can of course easily
> divide the matrix into chunks. I just need a way to process each chunk
> in parallel (other than opening new R sessions for each core manually).
>
> Any help much appreciated - after two days of trying to get this to work
> I'm pretty burnt out.

Hi Ben -- in your code from this morning you had a function

fitting <- function(ndx.grd=two,dt.grd=one,ind.vr='ind',rsp.vr='res') {
     ## ... setup
     for(i in 1:length(ndx.grd[,1])){
         ## ... do work
     }
     ## ... collate results
}

that you're trying to run in parallel. Obviously the ## ... represent lines I've 
removed. When you say something like

y <- foreach(icount(length(two))) %dopar% fitting()

its saying that you want to run fitting() length(two) times. So you're actually 
doing the same thing length(two) times, whereas you really want to divide the 
work thats inside fitting() into chunks, and do those on separate cores!

Conceptually what you'd like to do is

fit_one <- function(idx, ndx.grd, dt.grd, ind.vr, rsp.vr) {
     ## ... do work on row idx _ONLY_
}

and then evaluate with

## ... setup
y <-
   foreach (idx = icount(nrow(two)) %dopar% one_fit(idx, two, one, "ind", "res")
## ... collate

so that fit_one fits just one of your combinations. foreach will worry about 
distributing the work. Make sure that fit_one works first, before trying to run 
this in parallel; your use of try(), trying to fit different data types 
(character, integer, numeric) into a matrix rather than data.frame, and the type 
coercions all indicate that you're fighting with R rather than working with it.

Hope that helps,

Martin

>
> Thanks
>
> *Ben Caldwell*
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-help mailing list