[R] Tapply.

Petr PIKAL petr.pikal at precheza.cz
Mon Apr 26 11:43:32 CEST 2010


Hi


steven mosher <moshersteven at gmail.com> napsal dne 26.04.2010 10:21:37:

> That fails:
> 
> The manual says:
> 
> tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)

> Arguments
> 
> X
> 
> an atomic object, typically a vector.
> 
> INDEX
> 
> list of factors, each of same length as X. The elements are coerced to 
factors by 
> as.factor.
> 
> my error says:

> 
> Error in tapply(DF[, 1:15], DF$Year, mean, na.rm = T) : 
> 
>   arguments must have same length
> 
> The issue that I have is I dont understand what the requirements for the 
list of factors
> are. In my example DF$Years is  a sequence of 
years..1979,1980,1982,1983, 1987..
> like that with missing years: so when the manual say: list of factors 
each the same
> length as X? what does that mean? I could have a DF with 20 rows and 
only two
> different years. or 20 rows and 20 different years. 
> 
> Suppose:
> 
> a<- c(1,2,3,4)
> > b<-c(2,3,4,5)
> > df=data.frame(a,b)
> > length(df)

data frame is not vector nor atomic but list hence length(df) gives you 
number of columns. It is similar to length of a list

> lll<-list(a=1, b=2, c=3)
> length(lll)
[1] 3
>

If you accept that the first argument of tapply has to be vector you can 
not put data frame there.

Next second argument has to be list of factors so you can put there 
several factors, each of the same length as first argument (a vector).

If you want to perform aggregating operation on whole data frame you shall 
consider

?by or ?aggregate

Other options are plyr or doBy packages.

Syntax for aggregate is quite similar to tapply, only first argument can 
be data frame.

Regards
Petr 


> 
> The length of DF is 2.
> Does that mean the "list of factors, each of same length as X." would 
have to be
> 2? that doesnt seem to make sense. 
> 
>  
> 
> On Mon, Apr 26, 2010 at 12:26 AM, Petr PIKAL <petr.pikal at precheza.cz> 
wrote:
> Hi
> 
> r-help-bounces at r-project.org napsal dne 26.04.2010 06:52:55:
> 
> > Having some difficulties with understanding how tapply works and 
getting
> > return values I expect
> >
> > Data: dataframe. DF  DF$Id $D $Year.......
> >
> >  Id                          D  Year Jan Feb Mar Apr May Jun Jul Aug 
Sep
> Oct
> > Nov Dec
> >  11264402000         1 1980  NA  NA  NA  NA  NA 212 203 209 228 237 
 NA
> NA
> >  11264402000         0 1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 
231
> NA
> >  11264402000         1 1981  NA 251  NA 248 241  NA  NA  NA 235  NA 
 NA
> 245
> >  11264402000         0 1982 236 237 242 240 242 205 199  NA  NA  NA 
 NA
> NA
> >  11264402000         1 1982 236  NA  NA 240 242  NA  NA  NA  NA  NA 
 NA
> NA
> >  11264402000         0 1983  NA 247  NA  NA  NA  NA  NA 205  NA  NA 
 NA
> NA
> >  11264402000         1 1983  NA 247  NA  NA  NA  NA  NA  NA  NA 225 
 NA
> NA
> >  11264402000         0 1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA 
 NA
> NA
> >  11264402000         0 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 
240
> NA
> >  11264402000         1 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 
240
> NA
> >  11264402000         3 1987  NA  NA  NA  NA  NA 218  NA  NA 235 243 
240
> NA
> >  11264402000         0 1988 238 246 249  NA 244 213 212 224 232 238 
232
> 230
> >  11264402000         1 1988 238 246 249 246 244 213 212 224 232  NA 
 NA
> 230
> >  11264402000         3 1988 238 246 249 246 244 213 212 224 232  NA 
 NA
> 230
> >  11264402000         0 1989 232 233 238 239 231  NA 215  NA  NA  NA 
 NA
> 238
> >  11264402000         1 1989 232 233 238 239 231  NA  NA  NA  NA  NA 
 NA
> 238
> >  11264402000         3 1989 232 233 238 239 231  NA  NA  NA  NA  NA 
 NA
> 238
> >
> > and the result should be a dataframe of column means by year  with the
> > variable D dropped (or kept doesnt matter)
> >
> > 11264402000         1  1980  NA  NA  NA  NA  NA 212 203 209 228 237 
 NA
> NA
> >  11264402000        .5  1981  NA  NA 243 244  NA  NA  NA  NA 225  NA 
231
>  NA
> >  11264402000        .5  1982 236 237 242 240 242 205 199  NA  NA  NA 
 NA
>  NA
> >  11264402000        .5  1983  NA 247  NA  NA  NA  NA  NA 205  NA  225 
NA
> >  NA
> >  11264402000        1  1986  NA  NA  NA 240  NA  NA  NA 213  NA  NA 
 NA
> NA
> >  11264402000         2 1987 241  NA  NA  NA  NA 218  NA  NA 235 243 
240
> NA
> >  11264402000        1.33 1988 238 246 249  246 244 213 212 224 232 238
> 232
> > 230
> >  11264402000        1.33  1989 232 233 238 239 231  NA 215  NA  NA  NA
> NA
> > 238
> >
> >  It would seem that Tapply should work
> >  result<-tapply( DF[,1:15], DF$Year, colMeans,na.rm=T)

> Why colMeans?  It is function used instead of apply(...,.. ,mean).
> 
> Maybe you want
> 
> result<-tapply( DF[,1:15], DF$Year, mean,na.rm=T)
> 
> Regards
> Petr
> 
> >
> >  but i get errors about the length of arguments, which
> >
> >    [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list