[R] Subtraction of group means using AGGREGATE and MERGE

Joris Meys jorismeys at gmail.com
Thu Jun 17 11:15:18 CEST 2010


Funny, I couldn't run your code using R 2.10.1 (aggregate required a
list). This said, take a look at the function ave() :

> X <- rep(1:4)

> Y <- rep(letters[1:2],each=2)

> Z <- data.frame(X,Y)

> system.time(replicate(1000,{
+   A <- aggregate(Z$X, by=list(Y=Z$Y), FUN=mean)
+   M <- merge(Z,A,by="Y")[,3]
+   Result <- X - M
+ }))
   user  system elapsed
   3.57    0.01    3.58

> system.time(replicate(1000,{
+   Result <- Z$X - ave(Z$X,Z$Y)
+   }))
   user  system elapsed
   0.25    0.00    0.25
>

Cheers
Joris


On Thu, Jun 17, 2010 at 9:22 AM, Ben Cocker <b.cocker at ucl.ac.uk> wrote:
> Hi all,
>
> This is my first ever post, so forgive me and let me know if my
> etiquette is less than that required.
>
> I am searching for a faster way of subracting group means within a
> data frame than the solution I've found so far, using AGGREGATE and
> MERGE.
>
> I'll flesh my question out using a trivial example: I have a data
> frame Z with two columns - one X of values and one Y of labels:
>
>> Z
>    X    Y
> 1    1    4
> 2    2    4
> 3    3    5
> 4    4    5
>
> I want to take the group means (for the two groups Y=4 and Y=5) and
> subtract them from X resulting in the vector Result = t(-0.5  0.5 -0.5
>  0.5). I have found a (slow) way of achieving this, using the
> AGGREGATE function to get the group means and then MERGE to construct
> an appropriate vector of these values, M:
>
>> A <- aggregate(Z$X, by=Z$Y, FUN=mean)
>> A
>   Y     X
> 1   4   1.5
> 2   5   3.5
>
>> M <- merge(Z,A,by="Y")[,3]
>> M
> [1] 1.5   1.5   3.5   3.5
>
>> Result <- X - M
>> Result
>    X
> 1 -0.5
> 2  0.5
> 3 -0.5
> 4  0.5
>
> My problem: for lots of records, while AGGREGATE is very fast, MERGE
> is very slow - in real life I need to call this routine many times
> over a very large dataset. Could anyone help me find a faster way of
> achieving the same goal?
>
> Many thanks,
>
> Ben Cocker
> MSc Statistics at UCL, London, UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



More information about the R-help mailing list