[R] identify duplicate entries in data frame and calculate mean

Matthew mccormack at molbio.mgh.harvard.edu
Tue May 24 22:17:01 CEST 2016


Thank you very much, Tom.
This gets me thinking in the right direction.
One thing I should have mentioned that I did not is that the number of 
rows in the data frame will be a little over 40,000 rows.

On 5/24/2016 4:08 PM, Tom Wright wrote:
> Using dplyr
>
> $ library(dplyr)
> $ x<-data.frame(Length=c(321,350,340,180,198),
>                         ID=c(rep('A234',3),'B123','B225') )
> $ x %>% group_by(ID) %>% summarise(m=mean(Length))
>
>
>
> On Tue, May 24, 2016 at 3:46 PM, Matthew 
> <mccormack at molbio.mgh.harvard.edu 
> <mailto:mccormack at molbio.mgh.harvard.edu>> wrote:
>
>     I have a data frame with 10 columns.
>     In the last column is an alphaneumaric identifier.
>     For most rows, this alphaneumaric identifier is unique to the
>     file, however some of these alphanemeric idenitifiers occur in
>     duplicate, triplicate or more. When they do occur more than once
>     they are in consecutive rows, so when there is a duplicate or
>     triplicate or quadruplicate (let's call them multiplicates), they
>     are in consecutive rows.
>
>     In column 7 there is an integer number (may or may not be unique.
>     does not matter).
>
>     I want to identify each multiple entries (multiplicates) occurring
>     in column 10 and then for each multiplicate calculate the mean of
>     the integers column 7.
>
>     As an example, I will show just two columns:
>     Length  Identifier
>     321     A234
>     350     A234
>     340     A234
>     180     B123
>     198     B225
>
>     What I want to do (in the above example) is collapse all the
>     A234's and report the mean to get this:
>     Length  Identifier
>     337     A234
>     180     B123
>     198     B225
>
>
>     Matthew
>
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>
>


	[[alternative HTML version deleted]]



More information about the R-help mailing list