[BioC] Help with sub setting data frame of DE genes

Joern Toedling toedling at ebi.ac.uk
Fri Apr 4 18:41:28 CEST 2008


Hi Scott,

taking the issue aside whether this is the ideal way of combining the
multiple probe-sets per gene,
I do not think that you would need a special function for this purpose.
Basic R functions will suffice.

Let  A be your data.frame, then

# first reorder the rows of your data.frame by p-value
A <- A[order(A$"Combined p-value"),]
# and remove any rows containing a gene symbol mentioned in a previous row
B <- A[!duplicated(A$"Gene Symbol"),]

Regards,
Joern

Ochsner, Scott A wrote:
> Dear list,
>
> I have a data frame with three columns.  First column is probe set IDs, Second column is associated gene symbol, and, third column is a p-value stat: 
>
> hgu133a ID	Gene Symbol	Combined p-value
> 217757_at	A2M	0.787923912
> 214440_at	NAT1	0.240689023
> 206797_at	NAT2	0.497092074
> 202376_at	SERPINA3	3.88E-13
> Etc....
>
> I would like to end up with a data frame where each row is a unique Gene Symbol.  In the case of multiple gene symbols I want to include the row with the lowest Combined p-value.  The above case would transform into:
>
>   hgu133a ID	Gene Symbol	Combined p-value
> 217757_at	A2M	0.787923912
> 214440_at	NAT1	0.240689023
> 202376_at	SERPINA3	3.88E-13
> Etc....
>
> Could someone point me to a function which would help me in this regard?  If this is more of an R mailing list post I apologize and will post there.
>
> Thanks,
>
>   
>> sessionInfo()
>>     
> R version 2.6.0 (2007-10-03) 
> i386-pc-mingw32 
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils     datasets 
> [8] methods   base     
>
> other attached packages:
>  [1] lumi_1.4.0           mgcv_1.3-29          affycoretools_1.10.0
>  [4] annaffy_1.10.0       KEGG_2.0.0           GO_2.0.0            
>  [7] gcrma_2.10.0         matchprobes_1.10.0   biomaRt_1.12.0      
> [10] RCurl_0.8-1          GOstats_2.4.0        Category_2.4.0      
> [13] genefilter_1.16.0    survival_2.32        RBGL_1.14.0         
> [16] annotate_1.16.0      xtable_1.5-1         GO.db_2.0.0         
> [19] AnnotationDbi_1.0.4  RSQLite_0.6-3        DBI_0.2-3           
> [22] graph_1.16.1         affy_1.16.0          preprocessCore_1.0.0
> [25] affyio_1.6.0         Biobase_1.16.0       limma_2.12.0        
>
> loaded via a namespace (and not attached):
> [1] cluster_1.11.10 XML_1.93-2.2
>
> Scott A. Ochsner, Ph.D.
> NURSA Bioinformatics
> Molecular and Cellular Biology
> Baylor College of Medicine
> Houston, TX. 77030
> phone: 713-798-6227 
>



More information about the Bioconductor mailing list