[R] average duplicated rows?

PIKAL Petr petr.pikal at precheza.cz
Mon Oct 15 08:03:06 CEST 2012


Hi

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Pieter Schoonees
> Sent: Friday, October 12, 2012 6:19 PM
> To: Vining, Kelly; r-help at r-project.org
> Subject: Re: [R] average duplicated rows?
> 
> You will have to split() the data and unsplit() it after making the
> alterations. Have a look at the plyr package for such functions.
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Vining, Kelly
> > Sent: Friday 12 October 2012 5:42
> > To: r-help at r-project.org
> > Subject: [R] average duplicated rows?
> >
> > Dear useRs,
> >
> > I have a slightly complicated data structure and am stuck trying to
> > extract what I need. I'm pasting an example of this data below. In
> > some cases, there are duplicates in the "gene_id" column because
> there
> > are two different "sample 1" values for a given "sample 2" value.
> > Where these duplicates exist, I need to average the corresponding
> "FL_EARLY"
> > values and retain the "FL_LATE" value and replace those two rows with
> > a row containing the "FL_EARLY" average so that I no longer have any
> > "gene_id" duplicates.
> >
> > Seems like this is a job for some version of the apply function, but
> > searching and puzzling over this has not gotten me anywhere. Any help
> > will be much appreciated!

Aggregate is designed for that

ex.ag<-aggregate(ex[,c("FL_EARLY", "FL_LATE")], list(ex$gene_id), mean)

you will lose sample1 and 2 but you did not mention you want to retain them and how.

Regards
Petr


> >
> > Example data:
> >
> >
> >              gene_id sample_1 sample_2   FL_EARLY  FL_LATE
> > 763938  Eucgr.A00054   fl_S1E   fl_S1L  13.170800  22.2605
> > 763979  Eucgr.A00101   fl_S1E   fl_S1L   0.367960  14.1202
> > 1273243 Eucgr.A00101    fl_S2   fl_S1L   0.356625  14.1202
> > 764169  Eucgr.A00350   fl_S1E   fl_S1L   7.381070  43.9275
> > 1273433 Eucgr.A00350    fl_S2   fl_S1L  10.674500  43.9275
> > 1273669 Eucgr.A00650    fl_S2   fl_S1L  33.669100  50.0169
> > 764480  Eucgr.A00744   fl_S1E   fl_S1L 132.429000 747.2770
> > 1273744 Eucgr.A00744    fl_S2   fl_S1L 142.659000 747.2770
> > 764595  Eucgr.A00890   fl_S1E   fl_S1L   2.937760  14.9647
> > 764683  Eucgr.A00990   fl_S1E   fl_S1L   8.681250  48.5492
> > 1273947 Eucgr.A00990    fl_S2   fl_S1L  10.553300  48.5492
> > 764710  Eucgr.A01020   fl_S1E   fl_S1L   0.000000  57.9273
> > 1273974 Eucgr.A01020    fl_S2   fl_S1L   0.000000  57.9273
> > 764756  Eucgr.A01073   fl_S1E   fl_S1L   8.504710 101.1870
> > 1274020 Eucgr.A01073    fl_S2   fl_S1L   5.400010 101.1870
> > 764773  Eucgr.A01091   fl_S1E   fl_S1L   3.448910  15.7756
> > 764826  Eucgr.A01152   fl_S1E   fl_S1L  69.565700 198.2320
> > 764831  Eucgr.A01158   fl_S1E   fl_S1L   7.265640  30.9565
> > 764845  Eucgr.A01172   fl_S1E   fl_S1L   3.248020  16.9127
> > 764927  Eucgr.A01269   fl_S1E   fl_S1L  18.710200  76.6918
> >
> >
> >
> > --Kelly V.
> >
> >
> > 	[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html and provide commented, minimal, self-contained,
> > reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list