[BioC] Extract microarray data for genes identified by GO analysis

James W. MacDonald jmacdon at uw.edu
Tue Feb 19 15:31:54 CET 2013


Hi Mark,

On 2/18/2013 3:13 PM, Mark [guest] wrote:
> Dear Gurus,
>
> I am doing an Illumina microarray analysis. The study design is a 2x2 (i.e. varying on two different conditions). As part of the analysis I'm doing a GO analysis. There are a few GO categories of special interest, so I want to extract data for the probes identified in these categories and cluster the data.
>
> The problem is that after performing the GO analysis, I essentially cannot figure out how to extract the data for these probes. I have done lots of googling and have figured out that "geneIdsByCategory" (e.g. geneIdsByCategory(mfOver1)[["GO:0001077"]]) will tell me the EntrezIDs for the genes, but I cannot figure out how to map those back to the probeIDs.
>
> I also came across "probeSetSummary," which maps between EntrezID and ProbeID, but the data from this method does not seem to match that from "geneIdsByCategory." Specifically, the number of unique EntrezIDs in each GO category are different. Here is some example output (only showing results from one GO category):
>
>> head(probeSetSummary(mfOver1,.05,sigProbesets=sigLL1))
> $`GO:0001077`
>    EntrezID         ProbeSetID selected
> 1    16600 0khLe85Huv0juQw.sQ        0
> 2    16600 35LRC1Xd1PNCJ05Ras        0
> 3    16600 rpUHFdf15SFI5LRC1U        0
> 4    18124 BteTYfS5fYo.qi6dh0        0
> 5    18124 TnIofrF1F97TYQnfX4        0
> 6    18124 rQIi6KJzkUI0QknwKE        0
> 7    21420 NVwtViinW54gHvi7Eg        0
> 8    21420 NZWVEWR3oXld_i3_4c        0
> 9    21420 xvEFZWVEWR3oXld_i0        0

This is what you want, but you didn't read the help page carefully.

sigProbesets: Optional vector of probeset IDs. See details for more
           information.

It appears you passed in the vector of unique Entrez Gene IDs (the 
geneIds), which is why you have all zeros in the selected column. If you 
pass in the probeset (or more correctly in your case, probe) IDs, you 
will have zeros and ones, and the ones indicate the probes that are 
significant. You may still want to subset to only a single Entrez Gene 
ID, as there is likely to be some information duplication between the 
probes that are supposed to interrogate the same transcript.



>
>> geneIdsByCategory(mfOver1)[["GO:0001077"]]
> [1] "13653" "16600" "18124" "21420" "22038"

This just gives you the Entrez Gene IDs that map to that particular 
category, AND are represented on your array.

Best,

Jim


>
>
> Can anyone give me guidance on how to get from the GO analysis to clustering? I know how to cluster, but getting from EntrezIDs back to probeIDs is my problem. Well, I think that's my problem anyway. If you know of a better way to do it, I'd love to hear it!
>
> Thanks in advance!
>
> Mark
>
>   -- output of sessionInfo():
>
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] GO.db_2.8.0            GOstats_2.24.0         graph_1.36.1           Category_2.24.0        limma_3.14.1           annotate_1.36.0        lumiMouseAll.db_1.18.0 org.Mm.eg.db_2.8.0
>   [9] RSQLite_0.11.2         DBI_0.2-5              AnnotationDbi_1.20.3   xtable_1.7-0           lumi_2.10.0            nleqslv_1.9.4          Biobase_2.18.0         BiocGenerics_0.4.0
> [17] vimcom_0.9-5           setwidth_1.0-2         lattice_0.20-10
>
> loaded via a namespace (and not attached):
>   [1] affy_1.36.0           affyio_1.26.0         AnnotationForge_1.0.2 BiocInstaller_1.8.3   colorspace_1.2-0      genefilter_1.40.0     grid_2.15.1           GSEABase_1.20.0
>   [9] IRanges_1.16.4        KernSmooth_2.23-8     MASS_7.3-22           Matrix_1.0-10         methylumi_2.4.0       mgcv_1.7-22           nlme_3.1-105          parallel_2.15.1
> [17] preprocessCore_1.20.0 RBGL_1.34.0           splines_2.15.1        stats4_2.15.1         survival_2.36-14      tcltk_2.15.1          tools_2.15.1          XML_3.95-0.1
> [25] zlibbioc_1.4.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list