[R] Aggregate and cross tabulation

jim holtman jholtman at gmail.com
Wed Oct 28 04:54:07 CET 2009


FIRST:

PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

If you expect an answer, please provide the data.  Here is one way of doing it:

> N <- 30
> x <- data.frame(A=sample(1:3, N, TRUE), B=sample(1:2, N, TRUE),
+     C=sample(1:2, N, TRUE), D=sample(1:4, N, TRUE), data=runif(N))
> require(reshape)
> x.m <- melt(x, measure='data')
> cast(x.m, A+B+C~D, mean)
   A B C          1         2         3          4
1  1 1 1 0.51473265 0.7396417 0.0853110        NaN
2  1 1 2 0.07246063 0.2939918       NaN        NaN
3  1 2 1        NaN       NaN 0.5297180 0.10505014
4  1 2 2        NaN 0.8383841       NaN        NaN
5  2 1 1        NaN       NaN 0.8016877 0.04152843
6  2 1 2 0.34448739       NaN       NaN 0.35757999
7  2 2 1 0.87943330       NaN 0.1431666 0.92051784
8  2 2 2        NaN       NaN 0.5008505        NaN
9  3 1 1 0.48216957 0.4230986       NaN 0.53786492
10 3 1 2        NaN 0.7602803       NaN 0.33989081
11 3 2 1 0.43471764       NaN 0.2642490        NaN
12 3 2 2        NaN 0.3665636       NaN 0.37875944



On Tue, Oct 27, 2009 at 8:32 PM, Jonathan Greenberg
<greenberg at ucdavis.edu> wrote:
> R-helpers:
>
>   I have a data frame containing 4 factor variables (let's say A,B,C, and D)
> and 1 numerical variable (N).  I would like to produce a cross-tabulated
> data frame in which A,B,C are individual columns, each factor of D is its
> own column, and the field is calculated as a given function of N (I would
> like to have two output data frames, one with the mean(N) and one with the
> sum(N), e.g.:
>
> A,    B,    C,    D1,                                        D2,
>                           ...,        DM
> A1   B1   C1   mean(N{A1,B1,C1,D1)})   mean(N{A1,B1,C1,D2)})
> mean(N{A1,B1,C1,DM)})
> A2   B1   C1   mean(N{A2,B1,C1,D1)})   mean(N{A2,B1,C1,D2)})
> mean(N{A2,B1,C1,DM)})
> etc...
>
> I can mostly do this with aggregate, e.g.
> output = aggregate(N,list(A,B,C,D),mean), but I can't get that last step of
> cross-tabulating the Ds to column headers.  table() and xtabs() appear to
> just count, rather than giving me access to sum() and mean().  Any ideas?
>  Ideally I'd like to do this in a single step, as the aggregate output
> (above) produces a much larger data frame than a cross-tabulated output
> would (in my particular case).
>
> --j
>
> --
>
> Jonathan A. Greenberg, PhD
> Postdoctoral Scholar
> Center for Spatial Technologies and Remote Sensing (CSTARS)
> University of California, Davis
> One Shields Avenue
> The Barn, Room 250N
> Davis, CA 95616
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list