[R] More efficient option to append()?

Timothy Bates timothy.c.bates at gmail.com
Thu Aug 18 09:46:35 CEST 2011


This takes a few seconds to do 1 million lines, and remains explicit/for loop form

numberofSalaryBands = 1000000 # 2000000
x        = sample(1:15,numberofSalaryBands, replace=T)
y        = sample((1:10)*1000, numberofSalaryBands, replace=T)
df       = data.frame(x,y)
finalN   = sum(df$x)
myVar    = rep(NA, finalN)
outIndex = 1
i        = 1
for (i in 1:numberofSalaryBands) {
	kount = df$x[i]
	myVar[outIndex:(outIndex+kount-1)] = rep(df$y[i], kount) # Make x[i] copies of value y[i]
	outIndex = outIndex+kount
}
head(myVar)
plyr::count(myVar)


On Aug 18, 2011, at 12:17 AM, Alex Ruiz Euler wrote:

> 
> 
> Dear R community,
> 
> I have a 2 million by 2 matrix that looks like this:
> 
> x<-sample(1:15,2000000, replace=T)
> y<-sample(1:10*1000, 2000000, replace=T)
>      x     y
> [1,] 10  4000
> [2,]  3  1000
> [3,]  3  4000
> [4,]  8  6000
> [5,]  2  9000
> [6,]  3  8000
> [7,]  2 10000
> (...)
> 
> 
> The first column is a population expansion factor for the number in the
> second column (household income). I want to expand the second column
> with the first so that I end up with a vector beginning with 10
> observations of 4000, then 3 observations of 1000 and so on. In my mind
> the natural approach would be to create a NULL vector and append the
> expansions:
> 
> myvar<-NULL
> myvar<-append(myvar, replicate(x[1],y[1]), 1)
> 
> for (i in 2:length(x)) {
> myvar<-append(myvar,replicate(x[i],y[i]),sum(x[1:i])+1)
> }
> 
> to end with a vector of sum(x), which in my real database corresponds
> to 22 million observations.
> 
> This works fine --if I only run it for the first, say, 1000
> observations. If I try to perform this on all 2 million observations
> it takes long, way too long for this to be useful (I left it running
> 11 hours yesterday to no avail).
> 
> 
> I know R performs well with operations on relatively large vectors. Why
> is this so inefficient? And what would be the smart way to do this?
> 
> Thanks in advance.
> Alex
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list