[R] data set size question

Greg Snow Greg.Snow at intermountainmail.org
Wed Jun 14 16:56:20 CEST 2006


If you need to analyze something bigger than memory can hold, one option
is the biglm package which will fit linear regression models (and a lot
of different analyses can be restructured as linear regression models)
on blocks of data so that the entire dataset is not in memory all at the
same time.

I tested it out with a database with over 23 million rows and it worked
great.  It computed the exact same answers (to about 7 decimal places, I
didn't bother to look beyond that) as a couple of other methods used for
the same values.



Hope this helps, 


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Carl Hauser
Sent: Tuesday, June 13, 2006 9:22 PM
To: r-help at stat.math.ethz.ch
Subject: [R] data set size question

Hi there,

I'm very new to R and am only in the beginning stages of investigating
it for possible use. A document by John Maindonald at the r-project
website entitled "Using R for Data Analysis and Graphics: Introduction,
Code and Commentary" contains the following paragraph, "The R system may
struggle to handle very large data sets. Depending on available computer
memory, the processing of a data set containing one hundred thousand
observations and perhaps twenty variables may press the limits of what R
can easily handle". This document was written in 2004.

My questions are:

Is this still the case? If so, has anyone come up with creative
solutions to mitigate these limitations? If you work with large data
sets in R, what have your experiences been?

>From what I've seen so far, R seems to have enormous potential and
capabilities. I routinely work with data sets of several hundred
thousand to several million. It would be unfortunate if such potential
and capabilities were not realized because of (effective) data set size
limitations.

Please tell me it ain't so.

Thanks for any help or suggestions.

Carl

 

	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



More information about the R-help mailing list