[BioC] "topGOdata" object: How to supply gene scores with a predefined list of genes

Thu Aug 27 22:01:43 CEST 2009

Hi Adrian,

	Thanks for the response.  You have confirmed for me that at the moment it is not possible to create a geneSel function which utilizes more than one argument.  Unfortunately, I want to utilize a fold change cutoff in addition to a p.value cutoff.  The only way I can do this is to provide a predefined list with the structure below where the factor level determines the genes of interest and the universe.  Unfortunately, it does not appear possible to also give a gene score (p.value) to the structure below.   I guess in situations were one wishes to utilize more than one selection criteria it will not be possible to use the KS test.  Don't get me wrong.  I still like the added value of being able to compare the classic, elim, and weight algorithms.     

> str(geneList)
  Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
  - attr(*, "names")= chr [1:34760] "10338001" "10338003" "10338004" "10338017" ... 

Thanks,

Scott

Scott A. Ochsner, PhD
One Baylor Plaza BCM130, Houston, TX 77030
Voice: (713) 798-6227  Fax: (713) 790-1275 
-----Original Message-----
From: Adrian Alexa [mailto:adrian.alexa at gmail.com] 
Sent: Thursday, August 27, 2009 11:56 AM
To: Ochsner, Scott A
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] "topGOdata" object: How to supply gene scores with a predefined list of genes

Hi Scott,

I'm not sure I totally understand your question, but if you want to build a "topGOdata" object from a list a genes for which you have scores (quantifying differential expression) there is a simple way to do it.

The first thing you need is a named numeric vector, where the gene identifiers are stored in the names attribute of the vector and the numeric values are the respective gene scores. The set of genes found in the names attribute defines the gene universe. For example, the following should work for you:

geneList <- p.adjust(fit$p.value[,1],method="BH"))
names(geneList) <- geneNames

Then you will need to define a function for specifying the list of interesting genes based on the scores (in your case the adjusted p-values). The function must return a logical vector specifying which gene is selected and which not. The function must have one argument, named allScore and must not depend on any attributes of this object.
If for example you want to select all genes with an adjusted p-value lower than 0.01, then the function should look like:

topDiffGenes <- function(allScore) {
       return(allScore < 0.01)
}

Now you can can build a "topGOdata" object as follows (in the code bellow I assume you are using a Bioconductor annotation package, for example "hgu133a")

## build the topGOdata class
GOdata <- new("topGOdata",
                   ontology = "BP",
                   allGenes = geneList,
                   geneSel = topDiffGenes,
                   annot = annFUN.db,
                   affyLib = "hgu133a")

## display the GOdata object
GOdata

I hope this answers your question. Please let me know if you have further problems.

Regards,
Adrian

On Tue, Aug 25, 2009 at 10:04 PM, Ochsner, Scott A<sochsner at bcm.tmc.edu> wrote:
> Hi,
>
> I would like to attach gene "score" info to a predefined list of 
> interesting genes to generate a topGOdata object.  The predefined list 
> of genes was obtained by:
>> library(limma)
>> library(topGO)
>>
> input<-cbind(FC=fit$coefficients[,1],pval=p.adjust(fit$p.value[,1],met
> ho
> d="BH"))
>> selectFUN<-function(x){return(abs(x[,1]) >=1 & x[,2] < 0.05)}
>> diffgenes<-selectFUN(input)
>> myInterestedGenes<-names(which(diffgenes==T))
>> geneNames<-rownames(input)
>> geneList<-factor(as.integer(geneNames %in% myInterestedGenes)) 
>> names(geneList)<-geneNames
>> str(geneList)
>  Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>  - attr(*, "names")= chr [1:34760] "10338001" "10338003" "10338004"
> "10338017" ...
>
> Unfortunately, the predefined list does not contain any DE "score"
> information.
> I would greatly appreciate any help in attaching the score information 
> to a predefined list or incorporating p.value as well as fold change 
> cutoffs into a geneSel function when creating a topGOdata object,
>
> Thanks for any help,
>
> Scott
>
> Scott A. Ochsner, PhD
> One Baylor Plaza BCM130, Houston, TX 77030
> Voice: (713) 798-6227  Fax: (713) 790-1275
>
>> sessionInfo()
> R version 2.9.0 (2009-04-17)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> other attached packages:
> [1] topGO_1.12.0        SparseM_0.80        GO.db_2.2.11
> RSQLite_0.7-1       DBI_0.2-4           AnnotationDbi_1.6.1
> Biobase_2.4.1       graph_1.22.2        limma_2.18.2
>
> loaded via a namespace (and not attached):
> [1] grid_2.9.0      lattice_0.17-25 tools_2.9.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>