[R] Efficiency: speeding up unlist that is currently running by row

Dimitri Liakhovitski ld7631 at gmail.com
Fri Mar 27 19:03:20 CET 2009


Hello everyone!
I have a piece of code that works and does what I need but...:

# I have 3 slots:
nr.of.slots<-3

# My data frame is "new.a":
new.a<-data.frame(x=c("john",
"mary"),y=c("pete","john"),z=c("mary","pete"),stringsAsFactors=FALSE)
print(new.a)

# Creating all possible combinations of the rows of "new.a" with all
possible combinations of "p1" and "p2" in 3 locations (3 new columns):
big.a<-cbind(new.a[rep(1:nrow(new.a),each=8),],expand.grid(paste("p",1:2,sep=""),paste("p",1:2,sep=""),paste("p",1:2,sep=""))[rep(1:8,nrow(new.a)),])
print(big.a)

# Making sure the last 3 columns are characters, not factors:
for(i in 1:nr.of.slots) { big.a[[(i+3)]]<-as.character(big.a[[(i+3)]]) }
str(big.a)

# Creating a final dataframe with as many columns as slots (i.e., 3);
each cell contains a name of a person and "p1" or "p2":
output<-data.frame(matrix(nrow = nrow(big.a), ncol = nr.of.slots))
for(i in 1:nr.of.slots) {
	names(output)[i]<-paste("slot",i,sep=".")
}

# THIS IS THE SECTION OF THE CODE I HAVE A QUESTION ABOUT:
for(i in 1:nr.of.slots) {
	output[[i]]<-lapply(1:nrow(big.a),function(x){
		out<-unlist(c(big.a[x,i],big.a[x,i+nr.of.slots]))
		return(out)
	})
}
print(output)

# This is exactly the output I am looking for: Each cell of "output"
contains just 2 words:
print(output[1,1])
str(output[1,1])


MY QUESTION:
The section of the code above, in which I am running an unlist is
looping through rows. My problem is that in my real data frame I'll
have over a million of rows and more than 3 columns in output. It's
very slow. Is it at all possible to speed it up somehow? Somehow merge
(pairwise) the whole columns of the dataframe and not row by row?

Thank you very much for any adivce!

-- 
Dimitri Liakhovitski
MarketTools, Inc.
Dimitri.Liakhovitski at markettools.com




More information about the R-help mailing list