[BioC] domainsignatures with non-human KEGG pathways

Robert Castelo robert.castelo at upf.edu
Tue Dec 15 12:49:07 CET 2009


dear list and, particularly, dear domainsignatures package maintainers
(Florian?),

i was trying to use the package domainsignatures from the current
BioC-devel version (see my sessionInfo at the end of this message) to
test for the enrichment of a gene list throughout the collection of
available KEGG pathways in mouse and found that the main function that
collects the KEGG data is tailored to be employed with human data only.
more concretely, the function 'getKEGGdata' contains the following
hardcoded line in its source:

    ensembl <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

since this function already provides the possibility of restricting the
set of pathways to be tested through the 'pathways' argument i guess
that it is not the intention of the package to restrict itself to human.
so, i'd like to suggest the maintainers to try to make the function
general for any organism for which KEGG and ensembl provide the
necessary data.

to get inmediately going i've made a quick dirty fix which i paste
below, just in case it may be useful.

btw, the package function 'gseDomain' outputs in my R-devel installation
the following warning after being called:

Warning message:
In progress(message = mess, sub = sub) : Need tcltk for the status bar

which i guess has to do with the fact that i'm missing some software
component in my linux box because loading 'tcltk' gives the following
messsage:

library(tcltk)
Error in firstlib(which.lib.loc, package) : 
  Tcl/Tk support is not available on this system
Error in library(tcltk) : .First.lib failed for 'tcltk'

searching for documentation about how to properly install 'tcltk' i've
found out that this package seems to be removed from CRAN, see
http://cran.r-project.org/web/packages/tcltk/index.html

and i've seen another package called 'tcltk2' which sounds like a
replacement for 'tcltk'. i just wanted to comment this in case it may be
an issue to consider for the package maintainers.

thanks!!!
robert.

myGetKEGGdata <- function(universe=NULL, pathways=NULL,
ensemblMart=NULL) { ## add ensemblMart argument
    op <- options(warn = -1)
    on.exit(options(op))
    if (class(try(readLines("http://www.bioconductor.org"), silent =
TRUE)) == 
        "try-error") 
        stop("Active internet connection needed for this function")
    options(op)
    if (!is.null(pathways)) 
        hKEGGids <- pathways
    else hKEGGids <- grep("^hsa", ls(KEGGPATHID2EXTID), value = TRUE)
    path2Genes <- mget(hKEGGids, KEGGPATHID2EXTID)
    hKEGGgenes <- union(universe, unique(unlist(path2Genes, use.names =
FALSE)))
    hKEGGgenes <- hKEGGgenes[!is.na(hKEGGgenes)]
    if (is.null(ensemblMart)) ## if no specific ensembl mart is provided
then use human
      ensemblMart <- "hsapiens_gene_ensembl"
    ensembl <- useMart("ensembl", dataset = ensemblMart)
    tmp <- getBM(attributes = c("entrezgene", "interpro"), filters =
"entrezgene", 
        values = hKEGGgenes, mart = ensembl)
    gene2Domains <- split(tmp$interpro, tmp$entrezgene, drop = FALSE)
    missing <- setdiff(hKEGGgenes, names(gene2Domains))
    gene2Domains[missing] <- ""
    hKEGGdomains <- unique(unlist(gene2Domains))
    hKEGGdomains <- hKEGGdomains[!is.na(hKEGGdomains)]
    path2Domains <- lapply(path2Genes, function(x, gene2Domains)
unique(unlist(gene2Domains[x], 
        use.names = FALSE)), gene2Domains)
    dims <- c(pathway = length(hKEGGids), gene = length(hKEGGgenes), 
        domain = length(hKEGGdomains))
    return(new("ipDataSource", genes = hKEGGgenes, pathways = hKEGGids, 
        domains = hKEGGdomains, gene2Domains = gene2Domains, 
        path2Domains = path2Domains, dims = dims, type = "KEGG"))
}

sessionInfo()
R version 2.11.0 Under development (unstable) (2009-10-06 r49948) 
x86_64-unknown-linux-gnu 

locale:
[1] C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets
methods  
[8] base     

other attached packages:
 [1] domainsignatures_1.7.0 biomaRt_2.3.0
prada_1.23.0          
 [4] rrcov_1.0-00           pcaPP_1.7
mvtnorm_0.9-8         
 [7] robustbase_0.5-0-1     RColorBrewer_1.0-2
KEGG.db_2.3.5         
[10] RSQLite_0.7-3          DBI_0.2-4
AnnotationDbi_1.9.2   
[13] Biobase_2.7.2         

loaded via a namespace (and not attached):
[1] MASS_7.3-4    RCurl_1.3-0   XML_2.6-0     stats4_2.11.0



More information about the Bioconductor mailing list