[R] Parallel Processing and Linear Regression

Martin Morgan mtmorgan at fhcrc.org
Fri Jul 25 15:26:29 CEST 2008


Hi Alan --

"Alan Spearot" <acspearot at gmail.com> writes:

> Does anybody have any suggestions regarding applying standard regression
> packages lm(), hccm(), and others within a parallel environment?  Most of
> the packages I've found only deal with iterative processes (bootstrap) or
> simple linear algebra.  While the latter might help, I'd rather not program
> the estimation code.  I'm currently using a IA-64 Teragrid system through UC
> San Diego.

If you mean that you have a single regression that takes a long time
to compute, using a parallel BLAS on a single machine (as described in
the R installation and administration guide) might help (I have no
direct relevant experience). Otherwise, I think you're out of luck in
terms of parallelizing without writing code.

If you mean that you've got many data sets for which you'd like to
perform regressions, then the general strategy is the coarse-grained
lapply-like solutions available in all the usual suspects snow / Rmpi
/ nws / and others.

If you've got really big data that doesn't fit easily in memory then
the available solutions are to get more memory (on a single 64 bit
machine) or to use a package such as biglm designed to work with large
data sets.

Hope that helps,

Martin

> Alan
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-help mailing list