[R] Stymied by plyr

Dennis Murphy djmuser at gmail.com
Thu Apr 21 09:44:56 CEST 2011


Hi:

The example below uses both the plyr and reshape packages. I'm
presuming that you expect the table proportions to comprise columns of
the output data frame - to do that, we'll use the cast() function from
reshape.

Writing functions to pass into a plyr function needs to be done with a
bit of care. When writing a function for ddply(), the output of the
function must be one of (a) a scalar; (b) a named vector; or (c) a
data frame. A simplified version of your problem is illustrated below;
the ratings variable is coerced to a factor with a fixed set of levels
so that the output of the (relative) frequency table has the same
length in each subgroup. This will matter when it comes to reshaping
the data frame.

library(plyr)
library(reshape2)   # more recent version of the original reshape package
# Example data frame: three schools, two components, 100 observations per school
df <- data.frame(school = rep(LETTERS[1:3], each = 100),
                 component = rep(rep(1:2, each = 50), 3),
                 rating = factor(sample(1:5, 300, replace = TRUE),
levels = 1:5))

# Function to compute the relative frequencies - the input is a
generic sub-data frame,
# the output is a data frame representation of prop.table()
mktab <- function(df) as.data.frame(prop.table(table(df$rating)))

# Apply the function to the input data frame by school/component subgroups
# Notice that the output has one row for each rating in each
school/component subgroup
(dftab <- ddply(df, .(school, component), mktab))

# reshape dftab to display the proportion of each rating in columns instead
cast(dftab, school + component ~ Var1)

# My result:
  school component    1    2    3    4    5
1      A         1 0.24 0.14 0.34 0.14 0.14
2      A         2 0.12 0.20 0.10 0.28 0.30
3      B         1 0.24 0.22 0.22 0.14 0.18
4      B         2 0.10 0.26 0.16 0.32 0.16
5      C         1 0.24 0.18 0.24 0.20 0.14
6      C         2 0.22 0.10 0.16 0.22 0.30

See
http://www.jstatsoft.org/v21/i12
http://www.jstatsoft.org/v40/i01
http://had.co.nz/plyr/
http://had.co.nz/reshape/

Re the last link, the original reshape package has been enhanced and
manifested in the package reshape2. Since you're new to all of this,
you're better off learning how reshape2 works in conjunction with plyr
and several other of Hadley's packages.

HTH,
Dennis

2011/4/20 Stuart Luppescu <slu at ccsr.uchicago.edu>:
> Hello, This is my first time trying to use plyr, and I'm getting
> nowhere. I have teacher ratings data (1:4), on 10 components, by
> external observers and internal observers, in schools in areas. I want
> to calculate the percentage of each rating given on each component, by
> each type of observer, within each school, within each area. The data
> look like this:
>
>    unit area ext.obs rating comp
> 11 77777    11       0      3    1
> 12 77777    11       0      4    2
> 13 77777    11       0      3    3
> 14 77777    11       0      4    4
> 15 77777    11       0      3    5
> 16 77777    11       0      3    6
> 17 77777    11       0      3    7
> 18 77777    11       0      3    8
> 19 77777    11       0      3    9
> 20 77777    11       0      3   10
>
> I thought this would be a perfect application for plyr. I tried this:
>
> calc.pct <- function(x) {
>  table(x)/sum(table(x))
> }
>
> pcts <- ddply(test.school, .(area, ext.obs, comp), calc.pct, x=rating)
> Error in .fun(piece, ...) : unused argument(s) (piece)
>
> Then I tried this:
>  pcts <- ddply(test.school, .(area, ext.obs, comp), .(calc.pct(rating)))
> Error in .fun(piece, ...) : attempt to apply non-function
>
> I tried all kinds of other variations but with no success. Can someone
> give me some pointers?
>
> Thanks.
> --
> Stuart Luppescu -=- slu .at. ccsr.uchicago.edu
> University of Chicago -=- CCSR
> 才文と智奈美の父 -=- Kernel 2.6.36-gentoo-r5
> Lars Strand: Will R run under Windows Pocket PC? Brian D. Ripley: We
> don't know! There are no binary versions of R for that platform, but
> perhaps you could find a suitable compiler and manage to build the
> sources. Outside pure mathematics it is usually very hard to establish
> that something cannot be done (and it can be very hard in pure
> mathematics, too). -- Lars Strand and Brian D. Ripley R-help (November
> 2004)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list