[R] identify duplicate entries in data frame and calculate mean

Nordlund, Dan (DSHS/RDA) NordlDJ at dshs.wa.gov
Tue May 24 22:15:27 CEST 2016


You have several  options.  

1.  You could use the aggregate function.  If your data frame is called DF, you could do something like

with(DF, aggregate(Length, list(Identifier), mean))

2.  You could use the dplyr package like this

library(dplyr)
summarize(group_by(DF, Identifier), mean(Length))


Hope this is helpful,

Dan

Daniel Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services


> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Matthew
> Sent: Tuesday, May 24, 2016 12:47 PM
> To: r-help at r-project.org
> Subject: [R] identify duplicate entries in data frame and calculate mean
> 
> I have a data frame with 10 columns.
> In the last column is an alphaneumaric identifier.
> For most rows, this alphaneumaric identifier is unique to the file, however
> some of these alphanemeric idenitifiers occur in duplicate, triplicate or more.
> When they do occur more than once they are in consecutive rows, so when
> there is a duplicate or triplicate or quadruplicate (let's call them multiplicates),
> they are in consecutive rows.
> 
> In column 7 there is an integer number (may or may not be unique. does not
> matter).
> 
> I want to identify each multiple entries (multiplicates) occurring in column 10
> and then for each multiplicate calculate the mean of the integers column 7.
> 
> As an example, I will show just two columns:
> Length  Identifier
> 321     A234
> 350     A234
> 340     A234
> 180     B123
> 198     B225
> 
> What I want to do (in the above example) is collapse all the A234's and report
> the mean to get this:
> Length  Identifier
> 337     A234
> 180     B123
> 198     B225
> 
> 
> Matthew
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list