[BioC] topGO and Arabidopsis data

Tue Dec 7 18:16:49 CET 2010

Hi Johannes,

I apologise for the late reply. I had a chance to look into the issue
you reported and apparently the problem is with the
org.At.tair.db_2.4.6. Namely the GO id reported in your error belongs
to the BPontology and to the MF ontology, at least based on
GO.db_2.4.5 and some other only resources. For example, if you run:

library(ath1121501.db)
goID <- "GO:0010241"

.sql <- "SELECT gene_id, go_id FROM genes INNER JOIN go_mf USING('_id')"
retVal <- dbGetQuery(org.At.tair_dbconn(), .sql)
go2prob <- split(retVal[["gene_id"]], retVal[["go_id"]])
go2prob[goID]

you will get:

$`GO:0010241`
[1] "AT5G25900"

But if you search this GO in the GO.db database you will find that it
belongs to the BP ontology.

GOTERM[[goID]

I hope this bug will be solved in the org.At.tair.db_2.4.6 package.
Until then I did a quick fix such that you can further use the topGO
package. The annFUN.db2 will keep only the GO terms available in the
GO.db specific for the chosen ontology:

annFUN.db2 <- function(whichOnto, feasibleGenes = NULL, affyLib) {

  ## we add the .db ending if needed
  affyLib <- paste(sub(".db$", "", affyLib), ".db", sep = "")
  require(affyLib, character.only = TRUE) || stop(paste("package",
affyLib, "is required", sep = " "))
  affyLib <- sub(".db$", "", affyLib)

  orgFile <- get(paste(get(paste(affyLib, "ORGPKG", sep = "")),
"_dbfile", sep = ""))

  try(dbGetQuery(get(paste(affyLib, "dbconn", sep = "_"))(),
                 paste("ATTACH '", orgFile(), "' as org;", sep ="")),
      silent = TRUE)

  .sql <- paste("SELECT DISTINCT probe_id, go_id FROM probes INNER JOIN ",
                "(SELECT * FROM org.genes INNER JOIN org.go_",
                tolower(whichOnto)," USING('_id')) USING('gene_id');", sep = "")

  retVal <- dbGetQuery(get(paste(affyLib, "dbconn", sep = "_"))(), .sql)

  ## restric to the set of feasibleGenes
  if(!is.null(feasibleGenes))
    retVal <- retVal[retVal[["probe_id"]] %in% feasibleGenes, ]

  ## split the table into a named list of GOs
  retVal <- split(retVal[["probe_id"]], retVal[["go_id"]])

  ## return only the GOs mapped in GO.db
  return(retVal[names(retVal) %in% ls(get(paste("GO",
toupper(whichOnto), "Term", sep = "")))])
}

Using this function you can build a topGOdata object as before (just
replace annFUN.db with  annFUN.db2):

library(topGO)

allProbes <- ls(ath1121501GO)

## generate a random set of interesting pro
myInterestingGenes <- sample(allProbes, 100)
geneList <- factor(as.integer(allProbes %in% myInterestingGenes))
names(geneList) <- allProbes

## build a topGOdata object
sampleGOdata <- new("topGOdata", description = "Simple session",
                    ontology = "MF", allGenes = geneList,
                    nodeSize = 10, annot = annFUN.db2,
                    affyLib = "ath1121501.db")

sampleGOdata

This works without a problem on the latest stable R/Bioconductor version.

Let me know if you have further questions.

Best regards,
Adrian

> sessionInfo()
R version 2.12.1 beta (2010-12-06 r53802)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US       LC_NUMERIC=C         LC_TIME=en_US
 [4] LC_COLLATE=C         LC_MONETARY=C        LC_MESSAGES=en_US
 [7] LC_PAPER=en_US       LC_NAME=C            LC_ADDRESS=C
[10] LC_TELEPHONE=C       LC_MEASUREMENT=en_US LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] ath1121501.db_2.4.5  org.At.tair.db_2.4.6 topGO_2.2.0
 [4] SparseM_0.86         GO.db_2.4.5          RSQLite_0.9-4
 [7] DBI_0.2-5            AnnotationDbi_1.12.0 Biobase_2.10.0
[10] graph_1.28.0

loaded via a namespace (and not attached):
[1] grid_2.12.0     lattice_0.19-13 tools_2.12.0

On Thu, Dec 2, 2010 at 5:44 PM, Johannes Hanson <s.j.hanson at uu.nl> wrote:
> Dear All,
>
> I am analyzing affymetrix expression data using the topGO package.
> I basically follow the script in the topGO vingette. It works fine for the CC and BP ontologies but for MF i get the following error:
>
>> sampleGOdata <- new("topGOdata", description = "Simple session", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.db,affyLib = affyLib)
>
> Building most specific GOs .....        ( 1348 GO terms found. )
>
> Build GO DAG topology ..........
>  There are no adj nodes for node:  GO:0010241
> Error in switch(type, isa = 0, partof = 1, -1) :
>  EXPR must be a length 1 vector
>
> I guess that there is something wrong in the annotation packages but honestly, I might well have misunderstood the error message. Using the examples in the topGO vingette and human annotation no errors are given.
>
> Thanks in advance,
> Johannes
>
>
>> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] topGO_2.2.0          SparseM_0.86         GO.db_2.4.5          graph_1.28.0         ath1121501.db_2.4.5  org.At.tair.db_2.4.6 RSQLite_0.9-3
>  [8] DBI_0.2-5            AnnotationDbi_1.12.0 ath1121501cdf_2.7.0  affy_1.28.0          Biobase_2.10.0
>
> loaded via a namespace (and not attached):
> [1] affyio_1.18.0         grid_2.12.0           lattice_0.19-13       preprocessCore_1.12.0 tools_2.12.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>