[BioC] conversion of geneset species ID

Iain Gallagher iaingallagher at btopenworld.com
Mon Sep 5 16:57:05 CEST 2011


Dear List

I wonder if someone could help me re-annotate the Broad c2 genesets from human to bovine IDs. Here's what I have so far:

rm(list=ls())
library(biomaRt)
library(GSEABase)

setwd('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/')

cowGenes <- read.table('cowGenesENID.csv', header=F, sep='\t')

cow = useMart("ensembl",dataset="btaurus_gene_ensembl")
orth = getBM(c("ensembl_gene_id","human_ensembl_gene"), filters="ensembl_gene_id",values = cowGenes[,1], mart = cow)
orth2 <- orth[which(orth[,2]!=''), ]#drop those with no human ortho

orth3 <- orth2[-which(duplicated(orth2[,1]) == TRUE),]#get only unique mappings i.e. one cow ID to one human ID

head(orth3)


This gets me a data frame of bovine ENSEMBL gene Ids and the human ortholog (again ENSEMBL id).

broadSets <- getGmt('/home/iain/Documents/Work/Results/bovineMacRNAData/deAnalysis/GSEAData/c2.all.v3.0.entrez.gmt', geneIdType = EntrezIdentifier('org.Hs.eg.db'))

broadSetsENS <- mapIdentifiers(broadSets, ENSEMBLIdentifier())

I now have the c2 Broad geneset with gene IDs converted to human ENSEMBL ids. I would like to map the postion of each of the ENSEMBL Ids in my dataframe (orth3) and then substitute in the bovine id and the clean up any NA's.

I am at rather a loss as to how to do this and wondered if someone with more familiarity with the GSEABase would be able to help (or perhaps suggest a different strategy!)?

Thanks

Iain

> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8    
 [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8   
 [7] LC_PAPER=en_GB.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GSEABase_1.14.0      graph_1.30.0         annotate_1.30.0     
 [4] org.Hs.eg.db_2.5.0   org.Bt.eg.db_2.5.0   RSQLite_0.9-4       
 [7] DBI_0.2-5            AnnotationDbi_1.14.1 Biobase_2.12.2      
[10] biomaRt_2.8.1       

loaded via a namespace (and not attached):
[1] RCurl_1.6-9  tools_2.13.1 XML_3.4-2    xtable_1.5-6
> 




More information about the Bioconductor mailing list