[BioC] Quick start to linking GO terms and microarray data

Giovanni Coppola gcoppola at ucla.edu
Wed Mar 1 18:11:26 CET 2006


Hello Steffen,
how do I connect to Wormbase?
thanks
Giovanni


 > listMarts()
[1] "ensembl_mart_37" "vega_mart_37"    "snp_mart_37"     "msd_mart_4"
[5] "uniprot_mart_17"

 > sessionInfo()
R version 2.2.1, 2005-12-20, powerpc-apple-darwin7.9.0
other attached packages:
biomaRt      XML   RMySQL      DBI
"1.4.0" "0.99-6"  "0.5-7" "0.1-10"




On Mar 1, 2006, at 5:42 AM, Steffen Durinck wrote:

> Hi,
>
> Next to Ensembl, biomaRt currently includes Wormbase, VEGA, Uniprot  
> and msd.
> Soon I expect plants to be represented as well via the Gramene  
> database
> (http://www.gramene.org).
>
> Best,
> Steffen
>
>
> michael watson (IAH-C) wrote:
>
>> Hi Steffen, Wolfgang
>>
>> Thanks a lot, the biomaRt package looks wonderful for the species  
>> that
>> are in ensembl... Are there any functions within it to annotate other
>> species? (Eg bacteria, plants etc)
>>
>> Many thanks
>> Mick
>>
>> -----Original Message-----
>> From: Steffen Durinck [mailto:sdurinck at ebi.ac.uk]
>> Sent: 01 March 2006 13:24
>> To: michael watson (IAH-C)
>> Cc: Sean Davis; Bioconductor
>> Subject: Re: [BioC] Quick start to linking GO terms and microarray  
>> data
>>
>> Hi Mike,
>>
>> As Wolfgang already suggested you can do this with the biomaRt  
>> package.
>> Here is how should do this:
>>
>>> library(biomaRt)
>> Loading required package: XML
>> Loading required package: RCurl
>>> mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>> Checking attributes and filters ... ok
>>> getGO(id=c(100,620),type="entrezgene",mart=mart)
>>
>>        go_id                                    go_description
>> evidence_code
>> 1  GO:0004000                      adenosine deaminase
>> activity           TAS
>> 2  GO:0016787                                hydrolase
>> activity           IEA
>> 3  GO:0009117                             nucleotide
>> metabolism           IEA
>> 4  GO:0009168  purine ribonucleoside monophosphate
>> biosynthesis           IEA
>> 5  GO:0019735 antimicrobial humoral response (sensu
>> Vertebrata)           TAS
>> 6  GO:0006955                                   immune
>> response           IMP
>> 7  GO:0006955                                   immune
>> response           IEA
>> 8  GO:0006163                      purine nucleotide
>> metabolism           IMP
>> 9  GO:0006163                      purine nucleotide
>> metabolism           IEA
>> 10 GO:0005737
>> cytoplasm           IDA
>> 11 GO:0005737
>> cytoplasm           IEA
>>   ensembl_gene_id ensembl_transcript_id
>> 1  ENSG00000196839       ENST00000359372
>> 2  ENSG00000196839       ENST00000359372
>> 3  ENSG00000196839       ENST00000359372
>> 4  ENSG00000196839       ENST00000359372
>> 5  ENSG00000196839       ENST00000359372
>> 6  ENSG00000196839       ENST00000359372
>> 7  ENSG00000196839       ENST00000359372
>> 8  ENSG00000196839       ENST00000359372
>> 9  ENSG00000196839       ENST00000359372
>> 10 ENSG00000196839       ENST00000359372
>> 11 ENSG00000196839       ENST00000359372
>>
>>
>> best,
>> Steffen
>>
>> michael watson (IAH-C) wrote:
>>
>>
>>
>>> Thanks Sean, but I really wanted to demonstrate this in Bioconductor
>>>
>>>
>> :-S
>>
>>
>>> I tried running the vignettes in goTools, the first time it froze  
>>> up my
>>> PC for about 30 minutes and then gave out a cryptic message about
>>> coercing x to a list, the second time it froze up my PC and then R
>>> crashed with no warning :-S
>>>
>>> As far as I can tell, GOStats doesn't have any clear examples of  
>>> simple
>>> mapping of microarray data to GO terms.
>>>
>>> Given that one of the major, fundamental tasks biologists want to  
>>> do is
>>> find out functional information for significantly differentailly
>>> expressed genes, shouldn't this be a little easier, and a little  
>>> more
>>> transparent, in bioconductor?
>>>
>>> Again, I ask, does anyone have any simple examples of going from  
>>> a list
>>> of LocusLink IDs to a list of GO Terms?  (i.e. GO identifiers and  
>>> the
>>> biological function/term associated with those identifiers)
>>>
>>> Many thanks
>>> Mick
>>>
>>> -----Original Message-----
>>> From: Sean Davis [mailto:sdavis2 at mail.nih.gov]
>>> Sent: 01 March 2006 11:44
>>> To: michael watson (IAH-C); Bioconductor
>>> Subject: Re: [BioC] Quick start to linking GO terms and  
>>> microarray data
>>>
>>>
>>>
>>>
>>> On 3/1/06 6:20 AM, "michael watson (IAH-C)"
>>>
>>>
>> <michael.watson at bbsrc.ac.uk>
>>
>>
>>> wrote:
>>>
>>>
>>>
>>>
>>>
>>>> Hi
>>>>
>>>> I want to investigate the GO terms associated with my microarray  
>>>> data
>>>> (normally, a list of genes from topTable() in limma)
>>>>
>>>> I have read the vignettes for goTools and GOStats, and to be  
>>>> honest, I
>>>> am still a little unclear what the overall process is,  
>>>> particularly if
>>>>
>>>>
>>>>
>>>>
>>> I
>>>
>>>
>>>
>>>
>>>> am working with a custom array and not with affy or operon.
>>>>
>>>> Lets say, for example, I have my array data in a data.frame  
>>>> containing
>>>> gene names.  In a separate data frame I have a link between my gene
>>>> names and LocusLink IDs.  How do I:
>>>>
>>>> 1) Find the GO terms associated with subsets of my genes? (I  
>>>> realise I
>>>> can use merge() to link my array data to the LocusLink ids, but  
>>>> what
>>>>
>>>>
>>>>
>>>>
>>> do
>>>
>>>
>>>
>>>
>>>> I do then?)
>>>>
>>>> 2) Fins out if a particular GO term is statistically over- 
>>>> represented
>>>>
>>>>
>>>>
>>>>
>>> in
>>>
>>>
>>>
>>>
>>>> a particular group
>>>>
>>>>
>>>>
>>>>
>>> Hi, Mick.
>>>
>>> I would take your locuslink IDs for your genes and dump out two  
>>> lists
>>>
>>>
>> to
>>
>>
>>> a
>>> text file:
>>>
>>> 1)  All LocusIDs on your array.
>>> 2)  All LoucsIDs in your genelist.
>>>
>>> Then use an external program or web tool such as DAVID/EASE to do  
>>> the
>>> analysis.
>>>
>>> That said, there was some discussion on using straight locusIDs  
>>> (rather
>>> than
>>> requiring a metadata package) in GOHyperG.  I don't know where that
>>> conversion stands.
>>>
>>> As to your question about linking genes to GO, that is actually  
>>> done at
>>> the
>>> transcript/protein level.  Merging to entrez gene (locuslink)  
>>> happens
>>> after
>>> the fact.  Using various data sources, you can link by refseq,
>>> locuslink,
>>> ensembl ids, ucsc knowngenes, human invitational ids (human), and
>>> probably
>>> several others in species other than human.
>>>
>>> Sean
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list