[R] Table Transformation

Paul Johnson pauljohn32 at gmail.com
Wed Mar 4 20:14:35 CET 2009


On Wed, Mar 4, 2009 at 11:58 AM, Christian Pilger
<christian.pilger at gmx.net> wrote:
>
> Dear R-experts,
>
> recently, I started to discover the world of R. I came across a problem,
> that I was unable to solve by myself (including searches in R-help, etc.)
>
> I have a flat table similar to
>
> key1    key2    value1
>
> abcd_1  BP      10
> abcd_1  BSMP    1A
> abcd_1  PD      25
> abcd_2  BP      20
> abcd_3  BP      80
> abcd_4  IA      30
> abcd_4  PD      70
> abcd_4  PS      N
>
> I wish to transform this table to obtain the following result:
>
>        key2
> key1    BP      BSMP    IA      PD      PS
> abcd_1  "10"    "1A"    ""      "25"    ""
> abcd_2  "20"    ""      ""      ""      ""
> abcd_3  "80"    ""      ""      ""      ""
> abcd_4  ""      ""      "30"    "70"    "N"
>

I think we would say that a dataframe of the first type is in the
"long" format, while the other one you want is in the "wide" format.
I've done changes like that with the "reshape" function that is in the
stats package.

This example you propose is like making one column for each "country"
where key 1 is like the "year" in which the observation is made.
Right?

You don't have an easily cut-and-pasteable code example, so I've
generated a little working example. Here, x1 is key 1 and x2 is key 2.

> x1 <- gl(4,5, labels=c("c1","c2","c3","c4"))
> x1
 [1] c1 c1 c1 c1 c1 c2 c2 c2 c2 c2 c3 c3 c3 c3 c3 c4 c4 c4 c4 c4
Levels: c1 c2 c3 c4
> x2 <- rep(1:5,4)
> df <- data.frame(x1, x2, y=rnorm(20))
> df
   x1 x2           y
1  c1  1  0.02095747
2  c1  2  0.05926233
3  c1  3 -0.07561916
4  c1  4 -1.06272710
5  c1  5 -1.89202032
6  c2  1 -0.04549782
7  c2  2 -0.68333187
8  c2  3 -0.99151410
9  c2  4 -0.29070280
10 c2  5 -0.97655024
11 c3  1  0.33411223
12 c3  2 -0.24907340
13 c3  3 -0.25469819
14 c3  4  1.23956157
15 c3  5 -1.38162430
16 c4  1  0.50343661
17 c4  2 -0.58126964
18 c4  3  0.24256348
19 c4  4 -0.39398578
20 c4  5  0.01664450
> reshape(df, direction="wide", timevar="x2", idvar="x1")
   x1         y.1         y.2         y.3        y.4        y.5
1  c1  0.02095747  0.05926233 -0.07561916 -1.0627271 -1.8920203
6  c2 -0.04549782 -0.68333187 -0.99151410 -0.2907028 -0.9765502
11 c3  0.33411223 -0.24907340 -0.25469819  1.2395616 -1.3816243
16 c4  0.50343661 -0.58126964  0.24256348 -0.3939858  0.0166445
>

Your case will have many missings, but I think the idea is the same.

HTH


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas




More information about the R-help mailing list