[BioC] finding and deleting repeated observations

mervi.alanne at wri.fi mervi.alanne at wri.fi
Fri May 28 19:27:03 CEST 2010


Dear all,

I'm a novice with R and could use some help. How could I find repeated
observations based on one column and select the one to keep based on
another column?

In more detail, this is the thing I want to achieve: 
-data.frame has 4 columns GeneSymbol, A, B, pvalue
-data in column GeneSymbol may be repeated 1-6 times
-data also contains unique observations
-Of the repeated obs, keep the obs which has the lowest pvalue
-Do not discard data from cols A and B 

Example input data:
GeneSymbol A B pvalue
ABC1 12 44 0.01
ABC1 2 32 0.05
AB 4 55 0.2
ABCD1 15 25 0.005
ABCD1 11 27 0.002
ABCD1 9 18 0.0001

I'd like the output to look like this:
GeneSymbol A B pvalue
ABC1 2 32 0.01
AB 4 55 0.2
ABCD1 9 18 0.0001

Any suggestions? 

-Mervi
Wihuri Research Institute



More information about the Bioconductor mailing list