[R] More efficient option to append()?

Alex Ruiz Euler rruizeuler at ucsd.edu
Thu Aug 18 01:17:20 CEST 2011



Dear R community,

I have a 2 million by 2 matrix that looks like this:

x<-sample(1:15,2000000, replace=T)
y<-sample(1:10*1000, 2000000, replace=T)
      x     y
[1,] 10  4000
[2,]  3  1000
[3,]  3  4000
[4,]  8  6000
[5,]  2  9000
[6,]  3  8000
[7,]  2 10000
(...)


The first column is a population expansion factor for the number in the
second column (household income). I want to expand the second column
with the first so that I end up with a vector beginning with 10
observations of 4000, then 3 observations of 1000 and so on. In my mind
the natural approach would be to create a NULL vector and append the
expansions:

myvar<-NULL
myvar<-append(myvar, replicate(x[1],y[1]), 1)

for (i in 2:length(x)) {
myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1)
}

to end with a vector of sum(x), which in my real database corresponds
to 22 million observations.

This works fine --if I only run it for the first, say, 1000
observations. If I try to perform this on all 2 million observations
it takes long, way too long for this to be useful (I left it running
11 hours yesterday to no avail).


I know R performs well with operations on relatively large vectors. Why
is this so inefficient? And what would be the smart way to do this?

Thanks in advance.
Alex



More information about the R-help mailing list