[BioC] Using GOstats for a non-model organism

Maureen J. Donlin donlinmj at slu.edu
Tue Feb 15 17:48:41 CET 2011


James,

Thanks ever so much.  Following your advice, I  was able to get this to 
work quite nicely.

Regards,
Maureen

On 2/15/11 9:51 AM, James W. MacDonald wrote:
> Hi Maureen,
>
> On 2/14/2011 5:50 PM, Maureen J. Donlin wrote:
>> James,
>>
>> Thanks for the reply. I figured out how to get the data into a data 
>> frame.
>> I was doing 2 things wrong, but here is the code that worked.
>>
>> > CneoGO <- read.table("Cneo_GOannot.txt", header=TRUE)
>> > head(CneoGO)
>> Goterm Evidence GeneID
>> 1 GO:0015893 IEA CNAG_00003
>> 2 GO:0043231 IEA CNAG_00003
>> 3 GO:0015203 IEA CNAG_00003
>> 4 GO:0044425 IEA CNAG_00003
>> 5 GO:0044444 IEA CNAG_00003
>> 6 GO:0015846 IEA CNAG_00003
>>
>> > goframeData = data.frame(CneoGO$Goterm, CneoGO$Evidence, 
>> CneoGO$GeneID)
>> > head(goframeData)
>> CneoGO.Goterm CneoGO.Evidence CneoGO.GeneID
>> 1 GO:0015893 IEA CNAG_00003
>> 2 GO:0043231 IEA CNAG_00003
>> 3 GO:0015203 IEA CNAG_00003
>> 4 GO:0044425 IEA CNAG_00003
>> 5 GO:0044444 IEA CNAG_00003
>> 6 GO:0015846 IEA CNAG_00003
>
> This step is unnecessary. The result of read.table() *is* a 
> data.frame, so you are just creating another data.frame here.
>
>>
>> So continuing with the tutorial guide, I executed the following:
>>
>> > library("GSEABase")
>> Loading required package: annotate
>>
>> > goFrame = GOFrame(goframeData, organism = "Cryptococcus neoformans")
>> Loading required package: GO.db
>>
>> > goFrame
>> An object of class "GOFrame"
>> Slot "data":
>> CneoGO.Goterm CneoGO.Evidence CneoGO.GeneID
>> 1 GO:0015893 IEA CNAG_00003
>> 2 GO:0043231 IEA CNAG_00003
>> ...
>> Slot "organism":
>> [1] "Cryptococcus neoformans"
>>
>> > goAllFrame = GOAllFrame(goFrame)
>>
>> > goAllFrame
>> An object of class "GOAllFrame"
>> Slot "data":
>> go_id evidence gene_id
>> 1 GO:0000001 IEA CNAG_00006
>> 2 GO:0000001 IEA CNAG_00088
>> ...
>> Slot "organism":
>> [1] "Cryptococcus neoformans"
>>
>>
>> > gsc <- GeneSetCollection(goAllFrame, setType = GOCollection())
>> > gsc
>> GeneSetCollection
>> names: GO:0000001, GO:0000002, ..., GO:2000045 (6658 total)
>> unique identifiers: CNAG_00006, CNAG_00088, ..., CNAG_06995 (4822 total)
>> types in collection:
>> geneIdType: GOAllFrameIdentifier (1 total)
>> collectionType: GOCollection (1 total)
>>
>> > universe = Lkeys(CneoGO)
>> Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function "Lkeys", for signature
>> "data.frame"
>
> So here you are getting mixed up with what Marc had to do to get his 
> example to run, and what you need to do. The 'universe' is just the 
> complete set of gene IDs from which your significant set was chosen.
>
> If you had an org.Cn.eg.db package, then you would do something 
> similar. However, you don't, which is the point of this exercise. The 
> corresponding set of gene IDs that you do have is the third column of 
> the data.frame you created above (goFrameData or CneoGO).
>
> Note here that you want to make sure that the gene IDs you use are 
> character values, not factors. The default for R when reading in a 
> data.frame is to convert a vector of strings to factor, so you either 
> want to use
>
> CneoGO <- read.table("Cneo_GOannot.txt", header=TRUE, stringsAsFactors 
> = FALSE)
>
> and then
>
> universe <- CneoGO[,3]
>
> or proceed as you already have, but then
>
> universe <- as.character(CneoGO[,3])
>
> In addition, note that you will need to construct your 'genes' vector 
> differently from what is shown on p.3 of the vignette, instead 
> selecting the set of significant genes from the results of your 
> analysis (again, using the CNAG gene IDs).
>
> From that point on, you continue as Marc shows in the vignette.
>
> Best,
>
> Jim
>
>
>
>>
>> Am I missing some data that is found in the library("org.Hs.egGO")? I
>> can do the same commands with it and the structure of the goFrame,
>> goAllFrame and gsc seem to be the same.
>>
>> Here's what I am trying to do. I have a microarray data set from a time
>> course experiment done with a fungal genome, C. neoformans. I have
>> clusters of genes which are associated based how their expression
>> changed in relation to the other genes on the array. So what I have are
>> gene lists, with no expression data or fold changes. For each list of
>> genes, I want to know what GO terms are over-represented.
>>
>> I apologize if these questions are too basic. It's just that most of the
>> software out there for gene enrichment analysis are designed for model
>> organisms.
>>
>> Again, any help is greatly appreciated.
>>
>> Regards,
>> Maureen
>>
>>
>>
>>
>>
>> On 2/14/11 3:23 PM, James W. MacDonald wrote:
>>> Hi Maureen,
>>>
>>> On 2/14/2011 3:27 PM, Maureen J. Donlin wrote:
>>>> Hi all,
>>>>
>>>> I'm new to R and have some very basic questions about using GOstats 
>>>> with
>>>> a non-model organism.
>>>> I'm trying to use the tutorial by Marc Carlson "How to Use GOstats
>>>> and...with unsupported model organisms."
>>>>
>>>> I've created a GO to gene mapping file with the following 3 columns of
>>>> data:
>>>> Goterm Evidence GeneID
>>>> GO:0015893 IEA CNAG_00003
>>>> GO:0043231 IEA CNAG_00003
>>>> GO:0015203 IEA CNAG_00003
>>>> GO:0044425 IEA CNAG_00003
>>>> ...
>>>>
>>>> I can import it using read.table, but I don't seem to be able to 
>>>> invoke
>>>> the data frame correctly.
>>>
>>> When you read it in using read.table(), you automatically have a
>>> data.frame.
>>>
>>>>
>>>> The tutorial reads:
>>>> library("org.Hs.eg.db")
>>>> frame = toTable(org.Hs.egGO)
>>>> goFrameData = data.frame(frame$go_id, frame$Evidence, frame$gene_id)
>>>
>>> Yep, this is just some code that Marc uses to create a data.frame so
>>> he can give an example.
>>>
>>>>
>>>> I imported the data into an object using read.table
>>>> >CneoGOanno <- read.table("Cneo_GOannot.txt")
>>>>
>>>> I tried to create a frame using:
>>>> > frame = toTable(CneoGOannot)
>>>> Error in function (classes, fdef, mtable) :
>>>> unable to find an inherited method for function "toTable", for 
>>>> signature
>>>> "data.frame"
>>>>
>>>> Do I have to create some sort of database for this organism first? If
>>>> so, what is it's format?
>>>>
>>>> Any suggestions would be most appreciated.
>>>
>>> Just go to the next step, which will be something like
>>>
>>> goFrame <- GOFrame(CneoGOanno, organism = "Cryptococcus neoformans")
>>> goAllFrame <- GOALLFrame(goFrame)
>>>
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Regards,
>>>> Maureen Donlin
>>>>
>>>> At the risk of too long of an email, here's the session info:
>>>> > sessionInfo()
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] org.Hs.eg.db_2.4.6 GOstats_2.16.0 RSQLite_0.9-4 DBI_0.2-5
>>>> graph_1.28.0 Category_2.16.0 AnnotationDbi_1.12.0
>>>> [8] Biobase_2.10.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] annotate_1.28.0 genefilter_1.32.0 GO.db_2.4.5 GSEABase_1.12.2
>>>> RBGL_1.26.0 splines_2.12.1 survival_2.36-2 tools_2.12.1
>>>> [9] XML_3.2-0 xtable_1.5-6
>>>>
>>>>
>>>
>>
>

-- 
Maureen J. Donlin, Ph.D.
Research Associate Professor

Dept. of Molecular Microbiology&  Immunology
Dept. of Biochemistry&  Molecular Biology
Saint Louis University School of Medicine
507 Doisy Research Center
1100 S. Grand
St. Louis, MO  63104
Phone: 314-977-8858



More information about the Bioconductor mailing list