[BioC] GOSeq with unsupported organism (Arabidopsis) and retrieving gene IDs from enriched GO categories

Dale Richardson drichardson at igc.gulbenkian.pt
Wed Mar 12 11:46:59 CET 2014

Hi All,

I'm currently working on a differential gene expression analysis and 
I've used GOSeq to find enriched GO categories, just like what is 
mentioned here 

), except I am using a non-supported organism (Arabidopsis). I've come 
to the exact point in the analysis as Fernando has in the above link, 
where I would like to extract all gene IDs associated with the enriched 
GO terms in my DE analysis.

My question is, how can I do this with a non-supported organism?

For a supported organism, the process looks to be straight forward.. but 
for an unsupported genome and for a newbie in R, the process isn't so 

This is some of the code that got me to where I am now.

#calculate pwf function
pwf = nullp(genes,bias.data=overlapLengths)

tairgo <- read.table("ATH_GO_GOSLIM.txt", header=F, sep="\t", fill=T) 
#read in GO Categories File

GO.wall <- goseq(pwf, gene2cat=tairgo[,c(1,6)]) # get ID and GO columns 
only from tairgo
GO.samp <- goseq(pwf, gene2cat=tairgo[,c(1,6)], 

enriched.GO = GO.wall$category[p.adjust(GO.wall$over_represented_pvalue, 
method = "BH") < 0.05]
enriched.sampgo = 
GO.samp$category[p.adjust(GO.wall$over_represented_pvalue, method = 
"BH") < 0.05]

What I've been thinking of doing is  looping through my enriched GO 
terms vector and finding all gene IDs that have matching GO terms in 
"tairgo". However, is there a better way to do this using one of the 
functions built into GOSeq?

Thanks so much for your valuable input!!

Dale Richardson, Ph.D.
Laboratory of Plant Molecular Biology
Instituto Gulbenkian de Ciência
Rua da Quinta Grande, 6
2780-156 Oeiras
Tel: +351 214 464 647

More information about the Bioconductor mailing list