[BioC] Selecting Unique rows in multiple column data frames

Jenny Drnevich drnevich at uiuc.edu
Mon Nov 6 16:48:47 CET 2006


Hi Matjaz,

For option 2, if your data frame is called 'mydata' just do:

mydata.unique <- mydata[ !duplicated(mydata$ID), ]

This is will pull out the first instance of each ID, along with the M value.

Cheers,
Jenny

At 04:32 AM 11/6/2006, alex lam \(RI\) wrote:
>Hi Matjaz,
>For option 1, have a look at the help page of the method "aggregate".
>
>I don't understand your option 2. Perhaps I am misreading what your are 
>saying.
>If you want to select unique rows according to column 1 and 2, you can 
>create a third column by joining col1 and 2
>
>Col3 <- paste(ID, M, sep="_")
>Index <- unique(Col3)
>YourData[Index,]
>
>But I can't see that any replicates would be having identical M values.
>
>Cheers,
>Alex
>
>------------------------------------
>Alex Lam
>PhD student
>Department of Genetics and Genomics
>Roslin Institute (Edinburgh)
>Roslin
>Midlothian EH25 9PS
>Great Britain
>
>Phone +44 131 5274471
>Web   http://www.roslin.ac.uk
>
>
>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch 
>[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Matja¾ Hren
>Sent: 06 November 2006 09:06
>To: Bioconductor
>Subject: [BioC] Selecting Unique rows in multiple column data frames
>
>Dear list!
>
>
>
>I have data frames with 2 columns of normalised microarray data (more that 
>10k rows, custom-made array) with the following layout (not real data):
>
>
>
>ID        M
>
>ID1      -4.60138
>
>ID2      -3.28832
>
>ID3      4.83560
>
>ID4      6.45286
>
>ID4      6.65235
>
>ID4      6.38745
>
>ID4      6.74514
>
>ID5      4.43995
>
>ID6      -1.78943
>
>ID7      -4.00257
>
>ID8      -4.46327
>
>ID9      -3.13956
>
>ID10    2.52233
>
>ID11    -1.81214
>
>ID11    -1.78625
>
>ID11    -1.61214
>
>ID11    -1.52354
>
>
>
>ID is the oligo ID (spot-ID), M is the corresponding M-value.
>
>
>
>Only one spot per block is present in replicates (4). Therefore I would 
>like to use one of the following 2 options:
>
>
>
>1. Average the M-values in rows that have the same ID and extract the data 
>table with both columns.
>
>2. or if the first option does not work: Extract the rows with unique ID 
>(both columns) and remove the replicates. I tried using "unique" on ID 
>column but I couldn't extend its use to more than one column in the data frame.
>
>
>
>I used R 2.4.0 and limma package for normalisation.
>
>
>
>
>
>Thank you in advance,
>
>
>
>
>
>Matjaz
>
>
>
>----------------------------------------------------------------------------
>
>Matjaz Hren
>
>
>
>National Institute of Biology
>
>Department of Plant Physiology and Biotechnology
>
>SLOVENIA
>
>----------------------------------------------------------------------------
>
>
>
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu



More information about the Bioconductor mailing list