[R] aggregate, by, tapply bug or not?

Prof Brian D Ripley ripley at stats.ox.ac.uk
Thu Jan 24 16:16:00 CET 2002


Ah, now it is clearer what you want to do.  You want to subset the weights
vector for each group.

There's no problem in lapply, it's just not clear what you wanted. Try,
if I understand your meaningless notation,

tapply(seq(along=lll), rrr, function(i, x, w) weighted.mean(x[i], w[i]),
       x=lll, w=ttt)

If you want to subset more than one thing, subset the index vector.


On Thu, 24 Jan 2002, Petr Pikal wrote:

> On 24 Jan 2002 at 11:54, Agustin Lobo wrote:
>
> >
> > In the case of *apply functions, the paramenters follow
> > the name of the function. I.e., if you want to compute a mean
> > with na.rm=T(which for one single vector would be
> > mean(mivector,na.rm=T), then
> >
> > apply(mat,1,mean,na.rm=T)
> >
> > Agus
> >
>
>
> Thanks to all.
>
> Actually this works for mean, sum, var, sd with na.rm=T. My problem is with
> weighted.mean It works as standalone function, but inside any aggregation
> function it causes warning and it ***does not compute correctly***.
>
> > weighted.mean(lll[rrr==2001],ttt[rrr==2001])
> [1] -0.9257375
>
> > tapply(lll,rrr,weighted.mean,ttt)
>       1997       1998       1999       2000       2001
> -0.4495764 -0.4956762 -0.4920173 -0.9416626 -0.9455542
> Warning messages:
> 1: longer object length
>         is not a multiple of shorter object length in: x * w
> <snip>
> 5: longer object length
>         is not a multiple of shorter object length in: x * w
>
> I traced the problem to ***lapply*** (probably the workhorse for all aggregate
> functions - see the enclosed code)
>
> > lapply(split(lll,rrr),weighted.mean,ttt)
>
> $"1997"
> [1] -0.4495764
>
> <snip>
> $"2001"
> [1] -0.9455542
>
> Warning messages:
> 1: longer object length
>         is not a multiple of shorter object length in: x * w
> <snip>
> 5: longer object length
>         is not a multiple of shorter object length in: x * w
>
>
>
> I used a modified wersion of weighted.mean which works alone
>
> > weighted.mean.modif(lll[rrr==2001],ttt[rrr==2001])
> [1] -0.9257375
>
> weighted.mean.modif_function (x, w)
> {
>     if (missing(w))
>         w <- rep(1, length(x))
>
> {	i <- complete.cases(x,w)
>         	w <- w[i]
>         	x <- x[i]
>     }
> sw <-sum(w)
>     sum(x * w)/sw
> }
>
> but using it in any aggregate function causes error and debugging does not show
> me any hints.
>
> > tapply(lll,rrr,weighted.mean,ttt)
> Error in complete.cases(...) : not all arguments have the same length
>
> debug: rval <- .Internal(lapply(X, FUN))
> Browse[1]>
> Error in complete.cases(...) : not all arguments have the same length
>
> and this is completely beyond my ability to solve it.
>
> I use R 1.4.0 Windows version,
>
> lll is some property of a product
> rrr are years
> ttt is tonage of the product
>
> they are all the same length (226) but the length varies from year to year
>
> > tapply(lll,rrr,length)
> 1997 1998 1999 2000 2001
>   48   51   40   42   45
>
> Please if anybody can tell me where is the mistake.
>
>
>
> > Dr. Agustin Lobo
> > Instituto de Ciencias de la Tierra (CSIC)
> > Lluis Sole Sabaris s/n
> > 08028 Barcelona SPAIN
> > tel 34 93409 5410
> > fax 34 93411 0012
> > alobo at ija.csic.es
> >
> >
> > On Thu, 24 Jan 2002, Petr Pikal wrote:
> >
> > > Dear R users
> > >
> > > I searched some sources but i did not find an answer.Please give me
> > > some hint to following problem.
> > >
> > > I would like to compute a summary statistic for some vector for
> > > different factor levels. I know I can use tapply or aggregate but I
> > > do not know if there is a way how to use function with several (two)
> > > variable input (like weighted.mean).
> > >
> > > I wrote a simple a function for factor weighted mean
> > > fff<-function(x,fact,w)
> > > {
> > > ws<-tapply(w,fact,sum)
> > > newx<-x*w
> > > tapply(newx,fact,sum)/ws
> > > }
> > >
> > > which can handle particular case but does exist some more general
> > > solution how to use FUN(X1,X2) in aggregation procedures (tapply,
> > > aggregate, by) directly?
> > >
> > > Thank you
> > > Petr Pikal
> > > petr.pikal at precheza.cz
> > > p.pik at volny.cz
> > >
> > >
> Petr Pikal
> petr.pikal at precheza.cz
> p.pik at volny.cz
>
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list