[BioC] Quick start to linking GO terms and microarray data

Steffen Durinck sdurinck at ebi.ac.uk
Wed Mar 1 15:02:26 CET 2006


Hi Mick,

The biomaRt package can retrieve data from BioMart data management 
systems (see: http://www.biomart.org).  Any database that provides such 
a BioMart implementation can thus be queried.  Ensembl and Wormbase for 
example provide this and are queried in real-time through biomaRt.
For species that are not in these systems, the biomaRt package can not 
provide help unless a local BioMart for this species is set up or you 
can try to convince the database of interest to include a BioMart 
system.   I expect plants and fly to be included soon but have no 
information on other species.

Best,
Steffen

michael watson (IAH-C) wrote:

>Hi Steffen
>
>Sorry if I am confused, but getGO() seems to require a connection to an
>ensembl database.  If I have identifiers for a species that is not in
>ensembl, can I still use biomaRt to retrieve GO (and other) annotations?
>
>If so, it is a little unclear how to do this from the vignettes :-S
>
>Thank you for the help
>
>Mick
>
>-----Original Message-----
>From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] 
>Sent: 01 March 2006 13:43
>To: michael watson (IAH-C)
>Cc: Bioconductor
>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>
>Hi,
>
>Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot and
>msd.
>Soon I expect plants to be represented as well via the Gramene database 
>(http://www.gramene.org).
>
>Best,
>Steffen
>
>
>michael watson (IAH-C) wrote:
>
>  
>
>>Hi Steffen, Wolfgang
>>
>>Thanks a lot, the biomaRt package looks wonderful for the species that
>>are in ensembl... Are there any functions within it to annotate other
>>species? (Eg bacteria, plants etc)
>>
>>Many thanks
>>Mick
>>
>>-----Original Message-----
>>From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk] 
>>Sent: 01 March 2006 13:24
>>To: michael watson (IAH-C)
>>Cc: Sean Davis; Bioconductor
>>Subject: Re: [BioC] Quick start to linking GO terms and microarray data
>>
>>Hi Mike,
>>
>>As Wolfgang already suggested you can do this with the biomaRt package.
>>Here is how should do this:
>>
>>    
>>
>>>library(biomaRt)
>>>      
>>>
>>Loading required package: XML
>>Loading required package: RCurl
>>    
>>
>>>mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>>>      
>>>
>>Checking attributes and filters ... ok
>>    
>>
>>>getGO(id=c(100,620),type="entrezgene",mart=mart)
>>>      
>>>
>>       go_id                                    go_description 
>>evidence_code
>>1  GO:0004000                      adenosine deaminase 
>>activity           TAS
>>2  GO:0016787                                hydrolase 
>>activity           IEA
>>3  GO:0009117                             nucleotide 
>>metabolism           IEA
>>4  GO:0009168  purine ribonucleoside monophosphate 
>>biosynthesis           IEA
>>5  GO:0019735 antimicrobial humoral response (sensu 
>>Vertebrata)           TAS
>>6  GO:0006955                                   immune 
>>response           IMP
>>7  GO:0006955                                   immune 
>>response           IEA
>>8  GO:0006163                      purine nucleotide 
>>metabolism           IMP
>>9  GO:0006163                      purine nucleotide 
>>metabolism           IEA
>>10 GO:0005737                                         
>>cytoplasm           IDA
>>11 GO:0005737                                         
>>cytoplasm           IEA
>>  ensembl_gene_id ensembl_transcript_id
>>1  ENSG00000196839       ENST00000359372
>>2  ENSG00000196839       ENST00000359372
>>3  ENSG00000196839       ENST00000359372
>>4  ENSG00000196839       ENST00000359372
>>5  ENSG00000196839       ENST00000359372
>>6  ENSG00000196839       ENST00000359372
>>7  ENSG00000196839       ENST00000359372
>>8  ENSG00000196839       ENST00000359372
>>9  ENSG00000196839       ENST00000359372
>>10 ENSG00000196839       ENST00000359372
>>11 ENSG00000196839       ENST00000359372
>>
>>
>>best,
>>Steffen
>>
>>michael watson (IAH-C) wrote:
>>
>> 
>>
>>    
>>
>>>Thanks Sean, but I really wanted to demonstrate this in Bioconductor
>>>   
>>>
>>>      
>>>
>>:-S
>> 
>>
>>    
>>
>>>I tried running the vignettes in goTools, the first time it froze up
>>>      
>>>
>my
>  
>
>>>PC for about 30 minutes and then gave out a cryptic message about
>>>coercing x to a list, the second time it froze up my PC and then R
>>>crashed with no warning :-S
>>>
>>>As far as I can tell, GOStats doesn't have any clear examples of
>>>      
>>>
>simple
>  
>
>>>mapping of microarray data to GO terms.
>>>
>>>Given that one of the major, fundamental tasks biologists want to do
>>>      
>>>
>is
>  
>
>>>find out functional information for significantly differentailly
>>>expressed genes, shouldn't this be a little easier, and a little more
>>>transparent, in bioconductor?
>>>
>>>Again, I ask, does anyone have any simple examples of going from a
>>>      
>>>
>list
>  
>
>>>of LocusLink IDs to a list of GO Terms?  (i.e. GO identifiers and the
>>>biological function/term associated with those identifiers)
>>>
>>>Many thanks
>>>Mick
>>>
>>>-----Original Message-----
>>>From: Sean Davis [mailto:sdavis2 at mail.nih.gov] 
>>>Sent: 01 March 2006 11:44
>>>To: michael watson (IAH-C); Bioconductor
>>>Subject: Re: [BioC] Quick start to linking GO terms and microarray
>>>      
>>>
>data
>  
>
>>>
>>>
>>>On 3/1/06 6:20 AM, "michael watson (IAH-C)"
>>>   
>>>
>>>      
>>>
>><michael.watson at bbsrc.ac.uk>
>> 
>>
>>    
>>
>>>wrote:
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>Hi
>>>>
>>>>I want to investigate the GO terms associated with my microarray data
>>>>(normally, a list of genes from topTable() in limma)
>>>>
>>>>I have read the vignettes for goTools and GOStats, and to be honest,
>>>>        
>>>>
>I
>  
>
>>>>am still a little unclear what the overall process is, particularly
>>>>        
>>>>
>if
>  
>
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>I
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>am working with a custom array and not with affy or operon.
>>>>
>>>>Lets say, for example, I have my array data in a data.frame
>>>>        
>>>>
>containing
>  
>
>>>>gene names.  In a separate data frame I have a link between my gene
>>>>names and LocusLink IDs.  How do I:
>>>>
>>>>1) Find the GO terms associated with subsets of my genes? (I realise
>>>>        
>>>>
>I
>  
>
>>>>can use merge() to link my array data to the LocusLink ids, but what
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>do
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>I do then?)
>>>>
>>>>2) Fins out if a particular GO term is statistically over-represented
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>in
>>>
>>>
>>>   
>>>
>>>      
>>>
>>>>a particular group
>>>>  
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>Hi, Mick.
>>>
>>>I would take your locuslink IDs for your genes and dump out two lists
>>>   
>>>
>>>      
>>>
>>to
>> 
>>
>>    
>>
>>>a
>>>text file:
>>>
>>>1)  All LocusIDs on your array.
>>>2)  All LoucsIDs in your genelist.
>>>
>>>Then use an external program or web tool such as DAVID/EASE to do the
>>>analysis.
>>>
>>>That said, there was some discussion on using straight locusIDs
>>>      
>>>
>(rather
>  
>
>>>than
>>>requiring a metadata package) in GOHyperG.  I don't know where that
>>>conversion stands.
>>>
>>>As to your question about linking genes to GO, that is actually done
>>>      
>>>
>at
>  
>
>>>the
>>>transcript/protein level.  Merging to entrez gene (locuslink) happens
>>>after
>>>the fact.  Using various data sources, you can link by refseq,
>>>locuslink,
>>>ensembl ids, ucsc knowngenes, human invitational ids (human), and
>>>probably
>>>several others in species other than human.
>>>
>>>Sean
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at stat.math.ethz.ch
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>
>>>
>>>
>>>   
>>>
>>>      
>>>
>>
>> 
>>
>>    
>>
>
>
>
>  
>



More information about the Bioconductor mailing list