[R] reshaping issue

Dennis Murphy djmuser at gmail.com
Tue May 17 23:36:18 CEST 2011


Hi:

Here's one way, using an abbreviated example:

du <- data.frame(v1 = factor(rep(1:10, each = 4)),
                 v2 = factor(rep(rep(1:2, each = 2), 10)),
                 v3 = factor(rep(1:2, 20)),
                 x1 = rnorm(40),
                 y1 = rnorm(40),
                 x2 = rnorm(40),
                 y2 = rnorm(40),
                 x3 = rnorm(40),
                 y3 = rnorm(40))

ds1 <- du[, grep('^[v,x]', names(du))]
ds2 <- du[, grep('^[v,y]', names(du))]

library(reshape2)
dm1 <- melt(ds1, id = grep('^v', names(ds1)), variable_name = 'xvars',
value_name = 'x')
dm2 <- melt(ds2, id = grep('^v', names(ds2)), variable_name = 'yvars',
value_name = 'y')

dm <- dm1
dm$y <- dm2$y
dm <- dm[with(dm, order(v1, v2, v3, xvars)), ]

If you already have the reshape package loaded, the value_name =
specification won't work; it requires reshape2 without the presence of
reshape. (Otherwise, you'll have to rename the value variable in each
of dm1 and dm2). The variable name vector may not be necessary, but is
included to show which variables have been reshaped into which rows.

The main trick is the grep() function: its use in the above code is to
pick out the variables whose names begin with v or x in the
construction of ds1, and to pick out the variables beginning with v or
y in ds2. Both ds1 and ds2 are then melted so that the values of the x
and y variables end up in one column in each melted data set. Since
they were melted the same way and have the same dimension, rather than
merging, it's simple enough to copy the y variable from the second
melted data frame to the first. A reordering of the rows produces the
final result.

HTH,
Dennis

On Tue, May 17, 2011 at 10:31 AM, Stijn Van Daele
<Stijn.VanDaele at ugent.be> wrote:
> Dear R users,
>
> I have a problem with reshaping data. I know such questions have been asked before, but I can't get it right, neither with the reshape function nor with the melt function.
>
> My dataset has about 407 variables and about 48000 cases.
>
> Each case looks as follows:
> V1     v2     v3    v4    v5    v6     v7    x1     y1     x2     y2 ....  x200     y200
>
> V1 is unique, v2-v7 are settings (that are linked to V1) and x and y are measures
>
> What I would like is for each V1 is a combination of its unique id, the settings that apply to that case and then the X and Y values (these are linked to each other, so belong in the same row). Something like this:
> V1     v2     v3    v4    v5    v6     v7    x1     y1
> V1     v2     v3    v4    v5    v6     v7    x2     y2
> ...
> V1     v2     v3    v4    v5    v6     v7    x200     y200
>
> I have difficulties with the fact that I have two varying variables (x and y) that should stay together.
>
> Could anyone help this R-newbie out?
>
> Thanks in advance,
> Stijn
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list