[BioC] Quick start to linking GO terms and microarray data

Sean Davis sdavis2 at mail.nih.gov
Wed Mar 1 14:28:37 CET 2006

> michael watson (IAH-C) wrote:
>> Hi Steffen, Wolfgang
>> Thanks a lot, the biomaRt package looks wonderful for the species that
>> are in ensembl... Are there any functions within it to annotate other
>> species? (Eg bacteria, plants etc)


This is a quick-and-dirty solution that will get you whatever NCBI has
available for gene ontology, including arabidopsis, for example.  Hope this
gets you another few species.  The species IDs included are:

> unique(gene2go$taxID)
 [1]   3702   4932   6239   7227   7955   9031   9606  10090  10116  36329
[11]  39947  83333 185431 195099 198094 211586 214684 223283 243164 243231
[21] 243233 246200 265669 284812

Hope this helps.


> download.file('ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz',
trying URL 'ftp://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz'
ftp data connection made, file length 5541317 bytes
opened URL
downloaded 5411Kb

> gene2go <- read.table(gzfile('gene2go.gz'),sep="\t",header=FALSE,quote="")
> colnames(gene2go) <- c('taxID', 'geneID', 'goID', 'evidence', 'qualifier',
'goTerm', 'pubmedlist')
> gene2go[match(1:10,gene2go$geneID),]
       taxID geneID       goID evidence qualifier
272227  9606      1 GO:0000004       ND
272230  9606      2 GO:0004867      IEA
NA        NA     NA       <NA>     <NA>      <NA>
NA.1      NA     NA       <NA>     <NA>      <NA>
NA.2      NA     NA       <NA>     <NA>      <NA>
NA.3      NA     NA       <NA>     <NA>      <NA>
NA.4      NA     NA       <NA>     <NA>      <NA>
NA.5      NA     NA       <NA>     <NA>      <NA>
272240  9606      9 GO:0004060      TAS
272244  9606     10 GO:0004060      TAS
                                             goTerm pubmedlist
272227                   biological process unknown          -
272230 serine-type endopeptidase inhibitor activity          -
NA                                             <NA>       <NA>
NA.1                                           <NA>       <NA>
NA.2                                           <NA>       <NA>
NA.3                                           <NA>       <NA>
NA.4                                           <NA>       <NA>
NA.5                                           <NA>       <NA>
272240       arylamine N-acetyltransferase activity   10908296
272244       arylamine N-acetyltransferase activity    2340091

# and an example from A. thaliana
# the GO for A. thaliana is from TAIR
> gene2go[match(819280,gene2go$geneID),]
      taxID geneID       goID evidence qualifier
12430  3702 819280 GO:0003700      ISS           transcription factor
12430    7948864

More information about the Bioconductor mailing list