[R] help with colsplit (reshape)

Ista Zahn istazahn at gmail.com
Fri Jun 13 17:46:06 CEST 2008


Dear list,

I'm trying to figure out how to use the reshape package to reshape  
data from a "wide" format to a "long" format. I have data like this

pid <- c(1:10)
predA <- c(-1,-2,-1,-2,-1,-2,-1,-2,-1,-2)
predB.1 <- c(0,0,0,1,1,0,0,0,1,1)
predB.2 <- c(2,2,3,3,3,2,2,3,3,3)
predC.1 <- c(10,10,10,10,10,11,11,11,11,11)
predC.2 <- c(12,12,13,13,13,12,12,13,13,13)
out.1 <- c(100:109)
out.2 <- c(200:209)
Data <- data.frame(pid, predA, predB.1, predB.2, predC.1, predC.2, out. 
1, out.2)

and I want to make it look like this:

head(L.Data <- reshape(Data, varying = list(3:4, 5:6, 7:8),  
idvar="pid", v.names=c("PredA", "PredB", "Out"),  
timevar="measure.num", times=c(1,2), direction="long"))
     pid predA measure.num PredA PredB Out
1.1   1    -1           1     0    10 100
2.1   2    -2           1     0    10 101
3.1   3    -1           1     0    10 102
4.1   4    -2           1     1    10 103
5.1   5    -1           1     1    10 104
6.1   6    -2           1     0    11 105

Using Hadley's JSS article "Reshaping Data with the reshape Package"  
as a guide, I tried the following:

M.Data <- melt(Data, id="pid")
M.Data2 <- cbind(M.Data, colsplit(M.Data$variable, split = ".", names  
= c("treatment", "time")))

but this gave a warning and resulted in

head(M.Data2)
   pid variable value treatment time NA. NA..1 NA..2 NA..3 NA..4
1   1    predA    -1        NA   NA  NA    NA    NA    NA    NA
2   2    predA    -2        NA   NA  NA    NA    NA    NA    NA
3   3    predA    -1        NA   NA  NA    NA    NA    NA    NA
4   4    predA    -2        NA   NA  NA    NA    NA    NA    NA
5   5    predA    -1        NA   NA  NA    NA    NA    NA    NA
6   6    predA    -2        NA   NA  NA    NA    NA    NA    NA

I searched the mailing list and found this post: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/11857.html 
  which led me to try

M.Data2 <- data.frame(M.Data, colsplit(M.Data$variable, split = "\\.",  
names = c("treatment", "time")))

which gave:

head(M.Data2)
   pid variable value treatment  time
1   1    predA    -1     predA predA
2   2    predA    -2     predA predA
3   3    predA    -1     predA predA
4   4    predA    -2     predA predA
5   5    predA    -1     predA predA
6   6    predA    -2     predA predA

Closer but no cigar.

I would be grateful if someone will tell me (a) how to reshape the  
data as described above using the reshape package, (b) what difference  
between split = "." and split = "\\." is, and (c) if more information  
about the colsplit command is available anywhere.

Thank you very much in advance,
Ista



More information about the R-help mailing list