[R] need help on computing double summation

Liaw, Andy andy_liaw at merck.com
Thu Jun 16 14:22:53 CEST 2005


If I understood correctly, the following might be simpler (dat is the data
frame holding the data):

> sum(ave(dat$x, dat$id, FUN=scale, scale=FALSE) * 
+     ave(dat$y, dat$id, FUN=scale, scale=FALSE))
[1] 6.229377

Andy


> From: Huntsinger, Reid
> 
> You could do something like
> 
> ids <- unique(mydata$id)
> ans <- vector(length=length(ids), mode="list")
> for (i in ids) {
>   g <- which(mydata$id == i)
>   ans[[i]] <- (length(g) - 1)*cov(mydata$x[g], mydata$y[g])
> }
> ans
> 
> but cov() returns NA for length 1 vectors, so you'd want an 
> if (length(g) ==
> 1) ans[i] <- 0 else ans[i] <- ... construction.
> 
> This is almost brute force; you could also use tapply, as follows:
> 
> sx <- tapply(mydata$x,INDEX=mydata$id,FUN=sum)
> sy <- tapply(mydata$y,INDEX=mydata$id,FUN=sum)
> sxy <- tapply(mydata$x*mydata$y, INDEX=mydata$id, FUN=sum)
> n <- tapply(mydata$id,INDEX=mydata$id,FUN=length) # or use table()!
> 
> and now your inner sum is
> 
> sxy - 2*sx*(sy/n) + n*(sx/n)*(sy/n) = sxy - sx*sy/n
> 
> so 
> 
> sum(sxy - sx*sy/n) should do.
> 
> One more approach is to make your dataset into a list of data 
> frames, one
> for each id, then use lapply(). The list can be created by 
> split(). In one
> line,
> 
> lapply(split(mydata,f=mydata$id),function(z) (length(z$x) - 
> 1)*cov(z$x,z$y))
> 
> and take sum(,na.rm=TRUE) to remove the NAs due to single ids 
> that you want
> to be zeros.
> 
> Reid Huntsinger
> 
> 
> 
> 
> Reid Huntsinger
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kerry Bush
> Sent: Wednesday, June 15, 2005 11:41 AM
> To: r-help at stat.math.ethz.ch
> Subject: [R] need help on computing double summation
> 
> 
> Dear helpers in this forum,
> 
>    This is a clarified version of my previous
> questions  in this forum. I really need your generous
> help on this issue.
> 
> > Suppose I have the following data set:
> > 
> > 
> > ......
> > 
> 
> Now I want to compute the following double summation:
> 
> sum_{i=1}^k
> sum_{j=1}^{n_i}(x_{ij}-mean(x_i))*(y_{ij}-mean(y_i))
> 
> i is from 1 to k,
> indexing the ith subject id; and j is from 1 to n_i,
> indexing the jth observation for the ith subject.
> 
> in the above expression, mean(x_i) is the mean of x
> values for the ith
> subject, mean(y_i) is the mean of y values for the ith
> subject. 
> 
> Is there a simple way to do this in R?
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments, 
> contains information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station, New Jersey, USA 08889), and/or its 
> affiliates (which may be known outside the United States as 
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> Banyu) that may be confidential, proprietary copyrighted 
> and/or legally privileged. It is intended solely for the use 
> of the individual or entity named on this message.  If you 
> are not the intended recipient, and have received this 
> message in error, please notify us immediately by reply 
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
> 
>




More information about the R-help mailing list