[R] long to wide on larger data set

Juliet Hannah juliet.hannah at gmail.com
Mon Jul 12 07:25:41 CEST 2010


I have a data set that has 4 columns and 53860858 rows. I was able to
read this into R with:

cc <- rep("character",4)
myData <- read.table("myData.csv",header=FALSE,skip=1,colClasses=cc,nrow=53860858,sep=",")


I need to reshape this data from long to wide. On a small data set the
following lines work. But on the real data set, it didn't finish even
when I took a sample of two (rows in new data). I didn't receive an
error. I just stopped it because it was taking too long. Any
suggestions for improvements? Thanks.

# start example
# i have commented out the write.table statement below

testData <- read.table(textConnection("rs9999853,cv0084,A,A
rs999986,cv0084,C,B
 rs9999883,cv0084,E,F
 rs9999853,cv0085,G,H
 rs999986,cv0085,I,J
 rs9999883,cv0085,K,L"),header=FALSE,sep=",")
 closeAllConnections()

mysamples <- unique(testData$V2)

for (one_ind in mysamples) {
   one_sample <- testData[testData$V2==one_ind,]
   mywide <- reshape(one_sample, timevar = "V1", idvar =
"V2",direction = "wide")
#   write.table(mywide,file
="newdata.txt",append=TRUE,row.names=FALSE,col.names=FALSE,quote=FALSE)
}



More information about the R-help mailing list