[R] identify duplicate entries in data frame and calculate mean

Tom Wright tom at maladmin.com
Tue May 24 22:08:29 CEST 2016


Using dplyr

$ library(dplyr)
$ x<-data.frame(Length=c(321,350,340,180,198),
                        ID=c(rep('A234',3),'B123','B225') )
$ x %>% group_by(ID) %>% summarise(m=mean(Length))



On Tue, May 24, 2016 at 3:46 PM, Matthew <mccormack at molbio.mgh.harvard.edu>
wrote:

> I have a data frame with 10 columns.
> In the last column is an alphaneumaric identifier.
> For most rows, this alphaneumaric identifier is unique to the file,
> however some of these alphanemeric idenitifiers occur in duplicate,
> triplicate or more. When they do occur more than once they are in
> consecutive rows, so when there is a duplicate or triplicate or
> quadruplicate (let's call them multiplicates), they are in consecutive rows.
>
> In column 7 there is an integer number (may or may not be unique. does not
> matter).
>
> I want to identify each multiple entries (multiplicates) occurring in
> column 10 and then for each multiplicate calculate the mean of the integers
> column 7.
>
> As an example, I will show just two columns:
> Length  Identifier
> 321     A234
> 350     A234
> 340     A234
> 180     B123
> 198     B225
>
> What I want to do (in the above example) is collapse all the A234's and
> report the mean to get this:
> Length  Identifier
> 337     A234
> 180     B123
> 198     B225
>
>
> Matthew
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list