[R] Create rows for columns in dataframe

arun smartpink111 at yahoo.com
Wed Aug 14 16:39:19 CEST 2013


Hi,
I tried the second method on a bigger dataset.  This is what I get, 

indx<-rep(1:nrow(dat1),6e4)
dat2<- dat1[indx,]

system.time({
vec1<- paste(dat2[,1],dat2[,2],colnames(dat2)[2],sep=".")
res2<-reshape(dat2,idvar="newCol",varying=list(2:26),direction="long")
res3<-res2[order(res2[,4]),]
res4<-  res3[res3[,3]!="",-4]
vec2<-paste(res4[,1],res4[,3],paste0("C",res4[,2]),sep=".")
 res4$PRIMAIRY<-vec2%in%vec1
 row.names(res4)<-1:nrow(res4)
res4$ID<- row.names(res4)
res4[,c(1,3)]<- lapply(res4[,c(1,3)],as.character)
res5<-res4[,c(5,1,3,4)]
colnames(res5)[3]<-"CODE"})
 # user  system elapsed 
#144.672   2.072 147.034  #reshape() step is taking most of the time
 dim(res5)
#[1] 2880000       4

#Comparing this to the first method on a smaller subset of dat2.
dat2New<- dat2[1:3e4,]

system.time({
res1<-do.call(rbind,lapply(seq_len(nrow(dat2New)),function(i) {x1<-as.character(unlist(dat2New[i,-1]));CODE<-x1[x1!=""];PRIMAIRY<-x1[x1!=""]==head(x1,1); DSYSRTKY=as.numeric(as.character(dat2[i,1]));data.frame(DSYSRTKY,CODE,PRIMAIRY,stringsAsFactors=FALSE) }))
 res1$ID<- row.names(res1)
res2<-res1[,c(4,1:3)]
})
#  user  system elapsed 
#166.452  15.752 182.643 
nrow(dat2)-nrow(dat2New)
#[1] 330000

You might also try library(data.table).  Should be faster..

A.K.








----- Original Message -----
From: Dark <info at software-solutions.nl>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, August 14, 2013 5:41 AM
Subject: Re: [R] Create rows for columns in dataframe

Hi A.K,

Thanks for your great help.
I'm now running your first suggestion on a 600.000 row sample after
verifying it works on a smaller sample.
It's now been running for 40 minutes. 
Which method do you think will be faster?

Regards Derk



--
View this message in context: http://r.789695.n4.nabble.com/Create-rows-for-columns-in-dataframe-tp4673607p4673704.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list