[R] Strange output daply with empty strata

Jan van der Laan rhelp at eoos.dds.nl
Thu Sep 9 11:43:02 CEST 2010


Dear list,

I get some strange results with daply from the plyr package. In the  
example below, the average age per municipality for employed en  
unemployed is calculated. If I do this using tapply (see code below) I  
get the following result:

         no      yes
A       NA 36.94931
B 51.22505 34.24887
C 48.05759 51.00198

If I do this using daply:

municipality       no      yes
            A 36.94931 48.05759
            B 51.22505 51.00198
            C 34.24887       NA

daply generates the same numbers. However, these are not in the  
correct cells. For example, in municipality A everybody is employed.  
Therefore, the NA should be in the cell for unemployed in municipality  
A.

Am I using daply incorrectly or is there indeed something wrong with  
the output of daply?

Regards,

Jan


I am using version 1.1 of the plyr-package.


# Generate some test data
data.test <- data.frame(
   municipality=rep(LETTERS[1:3], each=10),
   employed=sample(c("yes", "no"), 30, replace=TRUE),
   age=runif(30,20,70))
# Make sure everybody is employed in municipality A
data.test$employed[data.test$municipality == "A"] <- "yes"

# Compare the output of tapply:
tapply(data.test$age, list(data.test$municipality, data.test$employed),
mean)
# to that of daply:
daply(data.test, .(municipality, employed), function(d){mean(d$age)} )
# results of ddply are the samen as tapply
ddply(data.test, .(municipality, employed), function(d){mean(d$age)} )



More information about the R-help mailing list