[R] ks.test; memory problems

William Dunlap wdunlap at tibco.com
Tue Mar 9 22:39:20 CET 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan
> Sent: Tuesday, March 09, 2010 1:28 PM
> To: r-help
> Subject: Re: [R] ks.test; memory problems
> 
> Furthermore, I am not even able to take a sample of my large vector
> (which does exist somehow and is in memory):
> 
> > sampleOfBigVector <- c(range(myBigVector),sample(myBigVector, 1000))
> Error: cannot allocate vector of size 718.0 Mb

Add the argument replace=TRUE to the call to sample()
to save space (presumable it is used to check for
duplicates in the sample).  It is unlikely to make
a difference in this case.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> 
> I guess I don't know what else I can do now, except find some cluster
> with a lot of memory to run this code on (presumably I'd be able to
> allocate those vectors then)?
> 
> Jonathan
> 
> 
> On Tue, Mar 9, 2010 at 4:11 PM, Jonathan <jonsleepy at gmail.com> wrote:
> > Hi R-help,
> >    I am interested in comparing two vectors of data
> > observations to see if they come from the same distrubution 
> (and have
> > settled on the Kolmogorov-Smirnov test to do this)..
> >
> > I'd prefer to use all my data points, but computationally speaking,
> > this is proving to be troublesome due to the size of my vectors (the
> > larger of the two is about 90 million observations).  I suppose I
> > could take a smaller sample of points from that large 
> vector to use as
> > input in my ks-test, but I want to see if I can avoid doing that, in
> > favor of including all of the data..
> >
> > Code:
> >> result <- ks.test(rep(1:940,100000),rep(1:940,800))
> > Error: cannot allocate vector of size 358.6 Mb
> >
> > Any ideas?
> >
> > OS: Windows 7 64-bit, R ver. 2.10.1, Memory: 4 gb
> >
> > Best,
> > Jonathan
> >
> >
> >
> > Best,
> > Jonathan
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list