[R] tapply

Marc Schwartz MSchwartz at mn.rr.com
Tue Jun 21 01:46:58 CEST 2005


On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:
> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391      2600
> 5541 389100307391      2600
> 5542 389100307391      2600
> 5543 389100307391      2600
> 5544 389100307391      2600
> 5546 381300302513        NA
> 5547 387000307470        NA
> 5548 387000307470        NA
> 5549 387000307470        NA
> 5550 387000307470        NA
> 5551 387000307470        NA
> 5552 387000307470        NA
> 
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
> 
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
> 
> please help.


The index vector(s) in tapply() need to be a "list". See the description
of the INDEX argument in ?tapply:

> tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)
381300302513 387000307470 389100307391 
           0            0        13000 


Note that the use of na.rm = TRUE here results in misleading values of 0
for the other two groups, which are all NA's and this is not
self-evident unless you know the data.

You may be better off with:

> tapply(z[[3]],list(z[[2]]), sum)
381300302513 387000307470 389100307391 
          NA           NA        13000 

unless your real data is a mix of NA's and measured values.

Also see ?complete.cases and ?na.omit for further approaches to dealing
with such data sets.

HTH,

Marc Schwartz




More information about the R-help mailing list