# [R] Fast ave for sorted data?

Charles C. Berry cberry at tajo.ucsd.edu
Sun Feb 15 20:08:37 CET 2009

```On Sun, 15 Feb 2009, Zhou Fang wrote:

> Hi,
>
> This is probably really obvious, by I can't seem to find anything on it.
>
> Is there a fast version of ave for when the data is already sorted in terms
> of the factor, or if the breaks are already known?
>

If all you want are means, you can use rle() and colMeans() to good
effect:

foo2 <-
function (x,y)
{

reps <- rle(x)\$lengths
lens <- rep(reps,reps)
uniqLens <- unique(lens)
for (i in uniqLens[ uniqLens != 1]){
y[ lens == i] <-
rep( colMeans(matrix(y[ lens == i], nr=i)), each=i)
}
y

}

> x <- sort( round( runif(100000, 0 , 1 ), 5) )
> y <- sample(1000000,100000)
> all.equal(ave(y,x),foo2(x,y))
 TRUE
> system.time(foo2(x,y))
user  system elapsed
0.087   0.029   0.117
> system.time(ave(y,x))
user  system elapsed
1.933   0.030   1.980
>

If, as in your example, a substantial fraction of the X's are unique, and
if you want to generalize to more than means, then you can still gain a
lot by treating the unique and non-unique values separately like this:

foo <-
function (x,y)
{

reps <- rle(x)\$lengths
len.not.1 <- rep(reps,reps) != 1
y[ len.not.1] <- ave( y[ len.not.1], x[ len.not.1 ])
y

}

> y <- sample(1000000,100000)
> x <- sort( round( runif(100000, 0 , 2 ), 5) )
> system.time(foo(x,y))
user  system elapsed
0.577   0.027   0.628
> system.time(ave(y,x))
user  system elapsed
2.513   0.038   2.545
> table(table(x))

1     2     3     4     5     6
60526 15161  2578   318    28     1

And if neither of these is quite good enough, a line or two of C code
should do the trick. See package 'inline'.

HTH,

Chuck

> Basically, I have:
> X = 0.1, 0.2, 0.32, 0.32, 0.4, 0.56, 0.56, 0.7...
> Y = 223, 434, 343, 544, 231.... etc
> of the same, admittedly large length.
>
> Now note that some of the values of X are repeated. What I want to do is, for
> those X that are repeated, take the corresponding values of Y and change them
> to the average for that particular X.
>
> So, ave(Y,X) will work. But it's very slow, and certainly not suited to my
> problem, where Y changes and X stays the same and I need to repeatedly
> recalculate the averaging of Y. Ave also does not take take advantage of the
> sorting of the data.
>
> So, is there an alternative? (Presumeably avoiding loops.)
>
> Thanks,
>
> Zhou Fang
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.
>
>

Charles C. Berry                            (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

```