[R] need help on computing double summation

Huntsinger, Reid reid_huntsinger at merck.com
Wed Jun 15 22:27:29 CEST 2005


You could do something like

ids <- unique(mydata$id)
ans <- vector(length=length(ids), mode="list")
for (i in ids) {
  g <- which(mydata$id == i)
  ans[[i]] <- (length(g) - 1)*cov(mydata$x[g], mydata$y[g])
}
ans

but cov() returns NA for length 1 vectors, so you'd want an if (length(g) ==
1) ans[i] <- 0 else ans[i] <- ... construction.

This is almost brute force; you could also use tapply, as follows:

sx <- tapply(mydata$x,INDEX=mydata$id,FUN=sum)
sy <- tapply(mydata$y,INDEX=mydata$id,FUN=sum)
sxy <- tapply(mydata$x*mydata$y, INDEX=mydata$id, FUN=sum)
n <- tapply(mydata$id,INDEX=mydata$id,FUN=length) # or use table()!

and now your inner sum is

sxy - 2*sx*(sy/n) + n*(sx/n)*(sy/n) = sxy - sx*sy/n

so 

sum(sxy - sx*sy/n) should do.

One more approach is to make your dataset into a list of data frames, one
for each id, then use lapply(). The list can be created by split(). In one
line,

lapply(split(mydata,f=mydata$id),function(z) (length(z$x) - 1)*cov(z$x,z$y))

and take sum(,na.rm=TRUE) to remove the NAs due to single ids that you want
to be zeros.

Reid Huntsinger




Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kerry Bush
Sent: Wednesday, June 15, 2005 11:41 AM
To: r-help at stat.math.ethz.ch
Subject: [R] need help on computing double summation


Dear helpers in this forum,

   This is a clarified version of my previous
questions  in this forum. I really need your generous
help on this issue.

> Suppose I have the following data set:
> 
> 
> ......
> 

Now I want to compute the following double summation:

sum_{i=1}^k
sum_{j=1}^{n_i}(x_{ij}-mean(x_i))*(y_{ij}-mean(y_i))

i is from 1 to k,
indexing the ith subject id; and j is from 1 to n_i,
indexing the jth observation for the ith subject.

in the above expression, mean(x_i) is the mean of x
values for the ith
subject, mean(y_i) is the mean of y values for the ith
subject. 

Is there a simple way to do this in R?

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list