[BioC] Help with sub setting data frame of DE genes

Ochsner, Scott A sochsner at bcm.tmc.edu
Fri Apr 4 17:31:34 CEST 2008


Dear list,

I have a data frame with three columns.  First column is probe set IDs, Second column is associated gene symbol, and, third column is a p-value stat: 

hgu133a ID	Gene Symbol	Combined p-value
217757_at	A2M	0.787923912
214440_at	NAT1	0.240689023
206797_at	NAT2	0.497092074
202376_at	SERPINA3	3.88E-13
Etc....

I would like to end up with a data frame where each row is a unique Gene Symbol.  In the case of multiple gene symbols I want to include the row with the lowest Combined p-value.  The above case would transform into:

  hgu133a ID	Gene Symbol	Combined p-value
217757_at	A2M	0.787923912
214440_at	NAT1	0.240689023
202376_at	SERPINA3	3.88E-13
Etc....

Could someone point me to a function which would help me in this regard?  If this is more of an R mailing list post I apologize and will post there.

Thanks,

> sessionInfo()
R version 2.6.0 (2007-10-03) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] splines   tools     stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] lumi_1.4.0           mgcv_1.3-29          affycoretools_1.10.0
 [4] annaffy_1.10.0       KEGG_2.0.0           GO_2.0.0            
 [7] gcrma_2.10.0         matchprobes_1.10.0   biomaRt_1.12.0      
[10] RCurl_0.8-1          GOstats_2.4.0        Category_2.4.0      
[13] genefilter_1.16.0    survival_2.32        RBGL_1.14.0         
[16] annotate_1.16.0      xtable_1.5-1         GO.db_2.0.0         
[19] AnnotationDbi_1.0.4  RSQLite_0.6-3        DBI_0.2-3           
[22] graph_1.16.1         affy_1.16.0          preprocessCore_1.0.0
[25] affyio_1.6.0         Biobase_1.16.0       limma_2.12.0        

loaded via a namespace (and not attached):
[1] cluster_1.11.10 XML_1.93-2.2

Scott A. Ochsner, Ph.D.
NURSA Bioinformatics
Molecular and Cellular Biology
Baylor College of Medicine
Houston, TX. 77030
phone: 713-798-6227 



More information about the Bioconductor mailing list