[BioC] Extract microarray data for genes identified by GO analysis

Mark [guest] guest at bioconductor.org
Mon Feb 18 21:13:34 CET 2013


Dear Gurus,

I am doing an Illumina microarray analysis. The study design is a 2x2 (i.e. varying on two different conditions). As part of the analysis I'm doing a GO analysis. There are a few GO categories of special interest, so I want to extract data for the probes identified in these categories and cluster the data. 

The problem is that after performing the GO analysis, I essentially cannot figure out how to extract the data for these probes. I have done lots of googling and have figured out that "geneIdsByCategory" (e.g. geneIdsByCategory(mfOver1)[["GO:0001077"]]) will tell me the EntrezIDs for the genes, but I cannot figure out how to map those back to the probeIDs.

I also came across "probeSetSummary," which maps between EntrezID and ProbeID, but the data from this method does not seem to match that from "geneIdsByCategory." Specifically, the number of unique EntrezIDs in each GO category are different. Here is some example output (only showing results from one GO category):

>head(probeSetSummary(mfOver1,.05,sigProbesets=sigLL1))
$`GO:0001077`
  EntrezID         ProbeSetID selected
1    16600 0khLe85Huv0juQw.sQ        0
2    16600 35LRC1Xd1PNCJ05Ras        0
3    16600 rpUHFdf15SFI5LRC1U        0
4    18124 BteTYfS5fYo.qi6dh0        0
5    18124 TnIofrF1F97TYQnfX4        0
6    18124 rQIi6KJzkUI0QknwKE        0
7    21420 NVwtViinW54gHvi7Eg        0
8    21420 NZWVEWR3oXld_i3_4c        0
9    21420 xvEFZWVEWR3oXld_i0        0

>geneIdsByCategory(mfOver1)[["GO:0001077"]]
[1] "13653" "16600" "18124" "21420" "22038"


Can anyone give me guidance on how to get from the GO analysis to clustering? I know how to cluster, but getting from EntrezIDs back to probeIDs is my problem. Well, I think that's my problem anyway. If you know of a better way to do it, I'd love to hear it!

Thanks in advance!

Mark

 -- output of sessionInfo(): 

R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GO.db_2.8.0            GOstats_2.24.0         graph_1.36.1           Category_2.24.0        limma_3.14.1           annotate_1.36.0        lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.8.0    
 [9] RSQLite_0.11.2         DBI_0.2-5              AnnotationDbi_1.20.3   xtable_1.7-0           lumi_2.10.0            nleqslv_1.9.4          Biobase_2.18.0         BiocGenerics_0.4.0    
[17] vimcom_0.9-5           setwidth_1.0-2         lattice_0.20-10       

loaded via a namespace (and not attached):
 [1] affy_1.36.0           affyio_1.26.0         AnnotationForge_1.0.2 BiocInstaller_1.8.3   colorspace_1.2-0      genefilter_1.40.0     grid_2.15.1           GSEABase_1.20.0      
 [9] IRanges_1.16.4        KernSmooth_2.23-8     MASS_7.3-22           Matrix_1.0-10         methylumi_2.4.0       mgcv_1.7-22           nlme_3.1-105          parallel_2.15.1      
[17] preprocessCore_1.20.0 RBGL_1.34.0           splines_2.15.1        stats4_2.15.1         survival_2.36-14      tcltk_2.15.1          tools_2.15.1          XML_3.95-0.1         
[25] zlibbioc_1.4.0       

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list