[R] Calculating group means using self-written function

Lauri Nikkinen lauri.nikkinen at iki.fi
Tue Oct 2 14:14:41 CEST 2007


Thanks Petr for your kind answer. I got it now but it seems that
argument y will not be split by "list(vsid$month, vsid$year)" in the
aggregate function. I should get number of days in each month in the
denominator with "length(unique(y))" but instead I get sum of days in
months in the denominator. So I will not get correct answers. Should I
modify my fun in some way?

Best regards,
Lauri

2007/10/2, Petr PIKAL <petr.pikal at precheza.cz>:
> Hi
>
> lauri.nikkinen at gmail.com napsal dne 02.10.2007 13:19:09:
>
> > Thanks Petr,
> >
> > Yes, your code seems to work. But when I try to reproduce it with my
> > original data set
> >
> > fun <- function(x, y) sum(x)/length(unique(y))
> > aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun,
> vsid$lev=vsid$date)
>
> Shall be
>
> aggregate(vsid$lev, list(vsid$month, vsid$yeari), fun, y=vsid$date)
>
> From help page
>
> ## S3 method for class 'data.frame':
> aggregate(x, by, FUN, ...)
>
> ...
> further arguments passed to or used by methods.
>
> Your function has 2 arguments one is x which is assigned vsid$lev and the
> other is y which you want to assign vsid$date. You can imagine that
> aggregate splits your "x" according to the levels mentioned in "by" and
> applies to each split a function "fun" together with any other argument,
> in your case "y". So you need to provide a correct name to your function
> otherwise it does not know what to do.
>
> Regards
> Petr
>
> >
> > I get
> >
> > Error: syntax error, unexpected EQ_ASSIGN, expecting ',' in
> > "aggregate(vsid$lev, list(vsid$month, vsid$year), fun, vsid$lev="
> >
> > Can you intepret what is wrong? vsid$date is
> >
> > $ date       :Class 'Date'  num [1:637] 13695 13695 13695 13695 13695
> ...
> >
> > Cheers,
> > Lauri
> >
> > 2007/10/2, Petr PIKAL <petr.pikal at precheza.cz>:
> > > Hi
> > >
> > > r-help-bounces at r-project.org napsal dne 02.10.2007 10:44:20:
> > >
> > > > Hi R-users,
> > > >
> > > > Suppose I have a following data set.
> > > >
> > > > y1 <- rnorm(20) + 6.8
> > > > y2 <- rnorm(20) + (1:20*1.7 + 1)
> > > > y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> > > > y <- c(y1,y2,y3)
> > > > var1 <- rep(1:5,12)
> > > > z <- rep(1:6,10)
> > > > f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> > > > d <- data.frame(var1=var1, z=z,y=y, f=f)
> > > >
> > > > Using following code I can calculate group means
> > > >
> > > > library(doBy)
> > > > summaryBy(y ~ f + var1, data=d, FUN=mean)
> > > >
> > > > How do I have to modify the FUN argument if I want to calculate mean
> > > > using unique values
> > > >
> > > > for instance
> > > >
> > > > fun <- function(x, y) sum(x)/length(unique(y))
> > > > summaryBy(y ~ f + var1, data=d, FUN=fun(y, z)
> > > >
> > > > Error in get(x, envir, mode, inherits) : variable "currFUN" of mode
> > > > "function" was not found
> > >
> > > Not sure how to do it in doBy but using aggregate
> > >
> > > aggregate(d$y, list(d$var1,d$f), fun, y=z)
> > >
> > > probably do what you want
> > >
> > > Regards
> > > Petr
> > >
> > > >
> > > > Best regards
> > > > LN
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
>
>



More information about the R-help mailing list