[R] reshape

Gabor Grothendieck ggrothendieck at gmail.com
Mon Feb 11 00:31:48 CET 2008


This isn't really well defined.   Suppose we have two rows that
both have a, a2 and a value for B.  Now suppose we have
another row with a,a2 but with a value for C.  Does the third row
go with the first one?  the second one?  a new row?  both the first
and the second?

Here is one possibility but without a good definition of the problem
we don't know whether its answering the problem that is intended.

In the code below we assume that all dat rows that
have the same sp value and the same code value are adjacent and
if a tr occurs among those dat rows that is equal to or less than the
prior row in factor level order then the new dat row must start a new
output row else not.   Thus within an sp/code group we assign each
row a 1 until we get a tr that is less than the prior row's tr and then
we start assigning 2 and so on.  This is the new column seq below.
We then use seq as part of our id.var in reshape.  For the particular
example in your post this does give the same answer.

f <- function(x) cumsum(c(1, diff(x) <= 0))
dat$seq <- ave(as.numeric(dat$tr), dat$sp, dat$code, FUN = f)
reshape(dat[-1], direction="wide", timevar="tr",
idvar=c("code","sp","seq" ))[-3]


On Feb 10, 2008 4:58 PM, juli pausas <pausas at gmail.com> wrote:
> Dear colleagues,
> I'd like to reshape a datafame in a long format to a wide format, but
> I do not quite get what I want. Here is an example of the data I've
> have (dat):
>
> sp <- c("a", "a", "a", "a", "b", "b", "b", "c", "d", "d", "d", "d")
> tr <- c("A", "B", "B", "C", "A", "B", "C", "A", "A", "B", "C", "C")
> code <- c("a1", "a2", "a2", "a3", "a3", "a3", "a4", "a4", "a4", "a5",
> "a5", "a6")
> dat <- data.frame(id=1:12, sp=sp, tr=tr, val=31:42, code=code)
>
> and below is what I'd like to obtain. That is, I'd like the tr
> variable in different columns (as a timevar) with their value (val).
>
> sp  code  tr.A  tr.B  tr.C
> a    a1   31    NA    NA
> a    a2   NA    32    NA
> a    a2   NA    33    NA    **
> a    a3   NA    NA    34
> b    a3   35    36    NA
> b    a4   NA    NA    37
> c    a4   38    NA    NA
> d    a4   39    NA    NA
> d    a5   NA    40    41
> d    a6   NA    NA    42
>
> Using reshape:
>
> reshape(dat[,2:5], direction="wide", timevar="tr", idvar=c("code","sp" ))
>
> I'm getting very close. The only difference is in the 3rd row (**),
> that is when sp and code are the same I only get one record. Is there
> a way to get all records? Any idea?
>
> Thank you very much for any help
>
> Juli Pausas
>
> --
> http://www.ceam.es/pausas
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list