[BioC] topGO using de novo assembled transcriptome

Wed Nov 9 05:33:52 CET 2011

Hi all,

> gene.table <- read.table("/Users/oystercow/Desktop/11:07:2011workfolder/p-value_for_topGO_5d_1d_all", header = TRUE, row.names=1)
> genelist_topGO_5d_1d_all <- as.numeric(gene.table$p.value)
> names(genelist_topGO_5d_1d_all) <- as.character(row.names(gene.table))

#My geneList looks good, just like the example, e.g.:

> head(genelist_topGO_5d_1d_all)
 comp0_c0_seq1 comp0_c0_seq10  comp0_c0_seq2  comp0_c0_seq3  comp0_c0_seq4  comp0_c0_seq5 
  1.742075e-03  3.160000e-159   1.453968e-02   9.230000e-06   3.300000e-14   1.710000e-65 

#Yet when I try to define and use the topDiffGenes function, the results are unexpected

> topDiffGenes <- function(allScore) {
+    return(allScore < 0.01)
 +   }

> sum(topDiffGenes(genelist_topGO_5d_1d_all))
[1] NA

#this should be <58819, and certainly not 'NA'

> length(topDiffGenes(genelist_topGO_5d_1d_all))
[1] 58819

#this is the total number of IDs, contigs in my case

> head(topDiffGenes(genelist_topGO_5d_1d_all))
 comp0_c0_seq1 comp0_c0_seq10  comp0_c0_seq2  comp0_c0_seq3  comp0_c0_seq4  comp0_c0_seq5 
          TRUE           TRUE          FALSE           TRUE           TRUE           TRUE 

#If you think my error came from:
> genelist_topGO_5d_1d_all <- as.numeric(gene.table$p.value)
#and that I instead should import the p.values as.character (which I saw on a previous posting, https://stat.ethz.ch/pipermail/bioconductor/2007-November/020045.html) 

> genelist_topGO_5d_1d_all_2 <- as.character(gene.table$p.value)
> names(genelist_topGO_5d_1d_all_2) <- as.character(row.names(gene.table))
> head(genelist_topGO_5d_1d_all_2)
 comp0_c0_seq1 comp0_c0_seq10  comp0_c0_seq2  comp0_c0_seq3  comp0_c0_seq4  comp0_c0_seq5 
 "0.001742075"    "3.16e-159"  "0.014539683"     "9.23e-06"      "3.3e-14"     "1.71e-65" 
> sum(topDiffGenes(genelist_topGO_5d_1d_all_2))
[1] NA
> length(topDiffGenes(genelist_topGO_5d_1d_all_2))
[1] 58819

#same results, except even worse , inaccurate comparisons:
> head(topDiffGenes(genelist_topGO_5d_1d_all_2))
 comp0_c0_seq1 comp0_c0_seq10  comp0_c0_seq2  comp0_c0_seq3  comp0_c0_seq4  comp0_c0_seq5 
          TRUE          FALSE          FALSE          FALSE          FALSE          FALSE 

I would like to do this:

> GOdata <- new("topGOdata", ontology = "BP", allGenes = genelist_topGO_5d_1d_all, geneSel = topDiffGenes(genelist_topGO_5d_1d_all), annot = annFUN.GO2genes, GO2genes = as.list(read.table("~/Desktop/annot_readyforR.annot", header = FALSE, sep = "\t")))

#using my own annotations
#"~/Desktop/annot_readyforR.annot", is:
comp517_c0_seq1	GO:0015850
comp517_c0_seq1	GO:0015665
comp517_c0_seq1	GO:0031224
comp517_c0_seq1	GO:0015291
comp517_c0_seq1	GO:0012501
comp517_c0_seq1	GO:0030001
comp1970_c0_seq1	GO:0004000
comp1970_c0_seq1	GO:0003676
comp1970_c0_seq1	GO:0031981
comp1970_c0_seq1	GO:0016553
comp1970_c0_seq1	GO:0019221
comp1970_c0_seq1	GO:0010467
comp1964_c0_seq1	GO:0005488
comp1964_c0_seq2	GO:0005488
...

My error message for the above is:

Error in checkSlotAssignment(object, name, value) : 
  assignment of an object of class "logical" is not valid for slot "geneSelectionFun" in an object of class "topGOdata"; is(value, "function") is not TRUE

Any suggestions?  topGO seems quite streamlined for microarray data but for "self-annotated" transcriptome data, any other hints would surely help.

Thanks,
Ian McDowell
University of Rhode Island