[BioC] Can my problem be addressed with Bioconductor?

Leonardi, Michela m.leonardi at ucl.ac.uk
Wed Jun 18 13:05:13 CEST 2014

Dear Jim,
thanks a lot for your quick and very useful answer. 
I finally used the following code

allGenes <- read file with all the genes in my dataset (conversion from SNP to gene done with Webgestalt)
univ <- select(org.Hs.eg.db, allGenes, "ENTREZID","ALIAS")

subSet <- read file with “interesting” genes
set <- select(org.Hs.eg.db, subSet, "ENTREZID","ALIAS")

p <- new("GOHyperGParams", 
         geneIds = unique(as.character(set$ENTREZID)), 
         universeGeneIds = univ, 
         ontology = "BP", 
         annotation = "org.Hs.eg.db”)

since I want to test the interesting genes versus all the genes in my set, and not the genes in my set versus all the human genes.

Thanks a lot again


Il giorno 17/giu/2014, alle ore 20:39, James W. MacDonald <jmacdon at uw.edu> ha scritto:

> Hi Michela,
> On 6/17/2014 2:14 PM, Michela Leonardi [guest] wrote:
>> Dear All, I am very new to Bioconductor and to Gene Ontology
>> analyses, so please forgive me if my question is trivial. I have as
>> "universe" a list of SNPs (not all of them) from the Affymetrix 6.0
>> SNPchip. After some population genetics analyses I defined a subset
>> of particular interest to me (i.e. showing signal of selection). I
>> would like to  analyze the subset of SNPs (or, better, associated
>> genes) in order to test for gene enrichment for gene ontology
>> categories.
>> My first question is: are GOstats and topGO the right tools to
>> perform this analysis on the kind of data I have (lists of genes as
>> text files)?
>> And if yes... I started "playing around" with Bioconductor and I got
>> stuck with the association: I could not find the way to tell to the
>> program that I used the Affymetrix 6.0 SNPchip. Could you point me
>> towards some link or document helping me going through all passages
>> needed to do the analyses I need?
> You are doing something unconventional, so you will not likely find anything that shows what to do.
> But note that (at least GOstats) is based on Gene IDs, so you need to map your SNPs to their 'associated' genes, and then get the Gene IDs (what used to be known as Entrez Gene IDs).
> Your universe will be the set of Gene IDs for which your universe of SNPs are associated. I have no idea how you are associating SNPs with genes, but the org.Hs.eg.db package is your friend. Say you have gene symbols (you shouldn't be relying on such things, but bear with me).
> symbols <- <some code to get symbols goes here>
> library(org.Hs.eg.db)
> univ <- unique(Lkeys(org.Hs.eg.db))
> egs <- select(org.Hs.eg.db, symbols, "ENTREZID","ALIAS")
> You may get a warning that you have one or more one-to-many mappings, which you may or may not decide to resolve.
> Then you just do the 'usual';
> p <- new("GOHyperGParams", geneIds = unique(as.character(egs$ENTREZID)), universeGeneIds = univ, ontology = "BP", annotation = "org.Hs.eg.db")
> hyp <- hyperGTest(p)
> Best,
> Jim
>> Thanks a lot for you help
>> Michela Leonardi
>> -- output of sessionInfo():
>> R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0
>> (64-bit)
>> locale: [1]
>> it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>> attached base packages: [1] stats     graphics  grDevices utils
>> datasets  methods   base
>> loaded via a namespace (and not attached): [1] tools_3.1.0
>> -- Sent via the guest posting facility at bioconductor.org.
>> _______________________________________________ Bioconductor mailing
>> list Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the
>> archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099

More information about the Bioconductor mailing list