[R] R 2.9.2 memory max - object vector size

William Dunlap wdunlap at tibco.com
Fri Sep 11 01:23:32 CEST 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of S. Few
> Sent: Thursday, September 10, 2009 1:46 PM
> To: r-help at r-project.org
> Subject: [R] R 2.9.2 memory max - object vector size
> 
> Me:
> 
> Win XP
> 4 gig ram
> R 2.9.2
> 
> library(foreign) # to read/write SPSS files
> library(doBy) # for summaryBy
> library(RODBC)
> setwd("C:\\Documents and Settings\\............00909BR")
> gc()
> memory.limit(size=4000)
> 
> ##  PROBLEM:
> 
> I have memory limit problems. R and otherwise. My dataframes for
> merging or subsetting are about 300k to 900k records.
> I've had errors such as vector size too large. gc() was done.....reset
> workspace, etc.
> 
> This fails:
> 
> y$pickseq<-with(y,ave(as.numeric(as.Date(timestamp)),id,FUN=seq))

If any values in id are singletons then the call to
seq(timestamp[id=="singleton"])
returns a vector whose length is timestamp[id=="singleton"] (not the
length
of that, the value of that).  as.numeric(as.Date("2009-09-10")) is 14497
so you
might have a lot of 14497-long vectors being created (and thrown away,
unused
except for their initial value).  Using seq_along instead of seq would
take
care of that potential problem.  E.g.,
   > d1<-data.frame(x=c(2,3,5e9,4,5),id=c("A","B","B","B","A"))
   > d2<-data.frame(x=c(2,3,5e9,4,5),id=c("A","B","C","B","A"))
   > # d1$id has no singletons, d2$id does where d2$x is huge
   > with(d1, ave(x,id,FUN=seq))
   [1] 1 1 2 3 2
   > with(d2, ave(x,id,FUN=seq))
   Error in 1L:from : result would be too long a vector
   > with(d2, ave(x,id,FUN=seq_along))
   [1] 1 1 1 2 2
   
If your intent is to create a vector of within-group sequence numbers
then there are more efficient ways to do it.  E.g., with the following
functions
   withinGroupSeq <- function(x){
      x <- as.factor(x)
      retval <- integer(length(x))
      retval[order(as.integer(x))] <- Sequence(table(x))
      retval
   }
   # Sequence is like base::sequence but should use less memory
   # by avoiding the list that sequence's lapply call makes.
   Sequence <- function(nvec) {
      seq_len(sum(nvec)) - rep(cumsum(c(0L,nvec[-length(nvec)])), nvec)
   }
you can get the same result as ave(FUN=seq_along) in less time and,
I suspect, less memory
   > withinGroupSeq(d1$id)
   [1] 1 1 2 3 2
   > withinGroupSeq(d2$id)
   [1] 1 1 1 2 2

Base R may have a function for that already.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 


> 
> Any clues?
> 
> Is this 2.9.2?
> 
> Skipping forward, should I download version R 2.8 or less?
> 
> Thanks!
> Steve
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list