[R] averaging between rows with repeated data
R. Michael Weylandt
michael.weylandt at gmail.com
Tue Nov 15 13:28:48 CET 2011
Oh sorry -- my mistake with ave() -- I only checked the first row....
drop = F is an optional argument to the function "[" which tells it to
return one of what it began with, rather than simplifying.
X = matrix(1:9, 3)
FALSE # Just a regular vector
is.matrix(X[,3,drop = F])
Aggregate wants a list in that second slot and data frames are
secretly also lists, so keeping it as a data frame gives the desired
On Tue, Nov 15, 2011 at 7:07 AM, Rob Griffin <robgriffin247 at hotmail.com> wrote:
> Thanks Michael,
> That second (aggregate) option worked perfectly - the first (cbind)
> generated averages for each row between the columns. (rather than between
> rows for each column).
> I came so close with aggregate yesterday - it is only slightly different to
> one my attempts (of admittedly very many attempts) to solve it so feels good
> that I was going along the right lines at some point!
> Could you possibly explain what this drop=F term is doing?
> (A very grateful and relieved phd student).
> (also if anyone fancies helping me with another problem I posted yesterday:
> -----Original Message----- From: R. Michael Weylandt
> Sent: Tuesday, November 15, 2011 12:46 PM
> To: robgriffin247
> Cc: r-help at r-project.org
> Subject: Re: [R] averaging between rows with repeated data
> Good morning Rob,
> First off, thank you for providing a reproducible example. This is one
> of those little tasks that R is pretty great at, but there exist
>> \infty ways to do so and it can be a little overwhelming for the
> beginner: here's one with the base function ave():
> cbind(ave(example[,2:4], example[,5]), id = example[,5])
> This splits example according to the fifth column (id) and averages
> the other values: we then stick another copy of the id back on the end
> and are good to go.
> The base function aggregate can do something similar:
> aggregate(example[,2:4], by = example[,5, drop = F], mean)
> Note that you need the little-publicized but super useful drop = F
> command to make this one work.
> There are other ways to do this with the plyr or doBy packages as
> well, but this should get you started.
> Hope it helps,
> On Tue, Nov 15, 2011 at 5:52 AM, robgriffin247
> <robgriffin247 at hotmail.com> wrote:
>> *The situation (or an example at least!)*
>> *this produces something like this:*
>> Letters numb1 numb2 numb3 id
>> 1 a 0.8139130 -0.9775570 -0.002996244 CG234
>> 2 b 0.8268700 0.4980661 1.647717998 CG232
>> 3 c 0.2384088 1.0249684 0.120663273 CG441
>> 4 d 0.8215922 0.5686534 1.591208307 CG128
>> 5 e 0.7865918 0.5411476 0.838300185 CG125
>> 6 f 2.2385522 1.2668070 1.268005020 CG182
>> 7 g 0.7403965 -0.6224205 1.374641549 CG232
>> 8 h 0.2526634 1.0282978 -0.110449844 CG441
>> 9 i 1.9333444 1.6667486 2.937252363 CG232
>> 10 j 1.6996701 0.5964623 1.967870617 CG125
>> *The Problem:*
>> Some of these id's are repeated, I want to average the values for those
>> within each column but obviously they have different numbers in the
>> column, and they also have different letters in the letters column, the
>> letters are not necessary for my analysis, only the duplicated id's and
>> numb columns are important
>> I also need to keep the existing dataframe so would like to build a new
>> dataframe that averages the repeated values and keeps their id - my actual
>> dataset is much more complex (271*13890) - but the solution to this can be
>> expanded out to my main data set because there is just more columns of
>> numbers and still only one alphanumeric id to keep in my example data, id
>> CG232 occurs 3 times, CG441 & CG125 occur twice, everthing else once so
>> new dataframe (from this example) there would be 3 number columns (numb1,
>> numb2, numb3) and an id the numb column values would be the averages of
>> rows which had the same id
>> so for example the new dataframe would contain an entry for CG125 which
>> would be something like this:
>> numb1 numb2 numb3 id
>> 1.2431 0.5688 1.403 CG125
>> Just as a thought, all of the IDs start with CG so could I use then grep
>> to delete CG and replace it with 0, that way duplicated ids could be
>> averaged as a number (they would be the same) but I still don’t know how
>> produce the new dataframe with the averaged rows in it...
>> I hope this is clear enough! email me if you need further detail or even
>> better, if you have a solution!!
>> also sorry to be posting my second question in under 24hours but I seem to
>> have become more than a little stuck – I was making such good progress
>> (also I'm sorry if this appears more than once on the mailing list - I'm
>> having some network & windows live issues so I'm not convinced previous
>> attempts to send this have worked, but have no way of telling if they are
>> just milling around in the internet somewhere as we speak and will decide
>> come out of hiding later!)
>> View this message in context:
>> Sent from the R help mailing list archive at Nabble.com.
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide
>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help