[R] reshaping some data

Tue Sep 14 21:58:31 CEST 2004

Here is another variation.  It uses LOCF which is last
observation carried forward -- a function which takes a
logical vector and for each element provides the index of
the last TRUE value.  The version of LOCF here assumes
that the first element of the argument is TRUE which
happens to be the case here.

LOCF <- function(L) which(L)[cumsum(L}]

is.x <- substr(colnames(x),1,1) == "x"
x. <- unlist(x[,LOCF(is.x)[!is.x]])
names(x.) <- NULL
data.frame(x = x., y = unlist(x[,!is.x]), row.names = NULL) 

I found Peter's solution particularly clever.  Note that it
depends on the y colnames having the same first digit as the
corresponding x colnames; however they need not be in any
specific order, whereas the solution above and my previous 
one below depend on the y names being immediately after the 
x names but do not depend on the detailed content of the names.  
In the present case both these assumptions appear to hold
but in different situations one or the other of these assumptions
might be preferable.

Gabor Grothendieck <ggrothendieck <at> myway.com> writes:

: 
: Try this:
: 
: is.x <- substr(colnames(x),1,1) == "x"   # TRUE if col name starts with x
: x. <- unlist(rep(x[,is.x], diff(which(c(is.x,TRUE)))-1))   # repeat x cols
: names(x.) <- NULL
: y. <- unlist(x[,!is.x])
: DF <- data.frame(x = x., y = y., row.names = NULL)
: 
: Sundar Dorai-Raj <sundar.dorai-raj <at> PDF.COM> writes:
: 
: : 
: : Hi all,
: :    I have a data.frame with the following colnames pattern:
: : 
: : x1 y11 x2 y21 y22 y23 x3 y31 y32 ...
: : 
: : I.e. I have an x followed by a few y's. What I would like to do is turn 
: : this wide format into a tall format with two columns: "x", "y". The 
: : structure is that xi needs to be associated with yij (e.g. x1 should 
: : next to y11 and y12, x2 should be next to y21, y22, and y23, etc.).
: : 
: :   x   y
: : x1 y11
: : x2 y21
: : x2 y22
: : x2 y23
: : x3 y31
: : x3 y32
: : ...
: : 
: : I have looked at ?reshape but I didn't see how it could work with this 
: : structure. I have a solution using nested for loops (see below), but 
: : it's slow and not very efficient. I would like to find a vectorised 
: : solution that would achieve the same thing.
: : 
: : Now, for an example:
: : 
: : x <- data.frame(x1 =  1: 5, y11 =  1: 5,
: :                  x2 =  6:10, y21 =  6:10, y22 = 11:15,
: :                  x3 = 11:15, y31 = 16:20,
: :                  x4 = 16:20, y41 = 21:25, y42 = 26:30, y43 = 31:35)
: : # which are the x columns
: : nmx <- grep("^x", names(x))
: : # which are the y columns
: : nmy <- grep("^y", names(x))
: : # grab y values
: : y <- unlist(x[nmy])
: : # reserve some space for the x's
: : z <- vector("numeric", length(y))
: : # a loop counter
: : k <- 0
: : n <- nrow(x)
: : seq.n <- seq(n)
: : # determine how many times to repeat the x's
: : repy <- diff(c(nmx, length(names(x)) + 1)) - 1
: : for(i in seq(along = nmx)) {
: :    for(j in seq(repy[i])) {
: :      # store the x values in the appropriate z indices
: :      z[seq.n + k * n] <- x[, nmx[i]]
: :      # move to next block in z
: :      k <- k + 1
: :    }
: : }
: : data.frame(x = z, y = y, row.names = NULL)
: 
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://stat.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
: 
: