[BioC] "topGOdata" object: How to supply gene scores with a predefined list of genes

Thu Aug 27 18:56:07 CEST 2009

Hi Scott,

I'm not sure I totally understand your question, but if you want to
build a "topGOdata" object from a list a genes for which you have
scores (quantifying differential expression) there is a simple way to
do it.

The first thing you need is a named numeric vector, where the gene
identifiers are stored in the names attribute of the vector and the
numeric values are the respective gene scores. The set of genes found
in the names attribute defines the gene universe. For example, the
following should work for you:

geneList <- p.adjust(fit$p.value[,1],method="BH"))
names(geneList) <- geneNames

Then you will need to define a function for specifying the list of
interesting genes based on the scores (in your case the adjusted
p-values). The function must return a logical vector specifying which
gene is selected and which not. The function must have one argument,
named allScore and must not depend on any attributes of this object.
If for example you want to select all genes with an adjusted p-value
lower than 0.01, then the function should look like:

topDiffGenes <- function(allScore) {
       return(allScore < 0.01)
}

Now you can can build a "topGOdata" object as follows (in the code
bellow I assume you are using a Bioconductor annotation package, for
example "hgu133a")

## build the topGOdata class
GOdata <- new("topGOdata",
                   ontology = "BP",
                   allGenes = geneList,
                   geneSel = topDiffGenes,
                   annot = annFUN.db,
                   affyLib = "hgu133a")

## display the GOdata object
GOdata

I hope this answers your question. Please let me know if you have
further problems.

Regards,
Adrian

On Tue, Aug 25, 2009 at 10:04 PM, Ochsner, Scott A<sochsner at bcm.tmc.edu> wrote:
> Hi,
>
> I would like to attach gene "score" info to a predefined list of
> interesting genes to generate a topGOdata object.  The predefined list
> of genes was obtained by:
>> library(limma)
>> library(topGO)
>>
> input<-cbind(FC=fit$coefficients[,1],pval=p.adjust(fit$p.value[,1],metho
> d="BH"))
>> selectFUN<-function(x){return(abs(x[,1]) >=1 & x[,2] < 0.05)}
>> diffgenes<-selectFUN(input)
>> myInterestedGenes<-names(which(diffgenes==T))
>> geneNames<-rownames(input)
>> geneList<-factor(as.integer(geneNames %in% myInterestedGenes))
>> names(geneList)<-geneNames
>> str(geneList)
>  Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
>  - attr(*, "names")= chr [1:34760] "10338001" "10338003" "10338004"
> "10338017" ...
>
> Unfortunately, the predefined list does not contain any DE "score"
> information.
> I would greatly appreciate any help in attaching the score information
> to a predefined list or incorporating p.value as well as fold change
> cutoffs into a geneSel function when creating a topGOdata object,
>
> Thanks for any help,
>
> Scott
>
> Scott A. Ochsner, PhD
> One Baylor Plaza BCM130, Houston, TX 77030
> Voice: (713) 798-6227  Fax: (713) 790-1275
>
>> sessionInfo()
> R version 2.9.0 (2009-04-17)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> other attached packages:
> [1] topGO_1.12.0        SparseM_0.80        GO.db_2.2.11
> RSQLite_0.7-1       DBI_0.2-4           AnnotationDbi_1.6.1
> Biobase_2.4.1       graph_1.22.2        limma_2.18.2
>
> loaded via a namespace (and not attached):
> [1] grid_2.9.0      lattice_0.17-25 tools_2.9.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>