[BioC] hgu133plus2cdfSYMBOL not found

Martin Morgan mtmorgan at fhcrc.org
Tue Dec 15 13:46:51 CET 2009


Tony McBryan wrote:
> Hello Philip,
> 
> Thank you very much for you help.
> 
> After some experimentation, the following code excerpt works when placed
> at the head of my previous script:
> 
> ---
> get.annotation <- function (x, cdfname, verbose = FALSE)
> {
>    library(paste(cdfname,".db",sep=""), character.only = TRUE)
> 
>    symb <- simpleaffy:::.strip.list(mget(x, envir = get(paste(cdfname,
> "SYMBOL", sep = ""))))

I think here a better approach, and one that might make it into
simpleaffy??) is to use annotate::getAnnMap("SYMBOL", cdfname) rather
than envir=<...>. getAnnMap will find and load the appropriate library,
whether 'cdfname' ends with .db. or not, so there is no
need for library(...) as the first line. This also requires
You'll want to have, before get.annotation

library(annotate)

Martin


>    desc <- simpleaffy:::.strip.list(mget(x, envir = get(paste(cdfname,
> "GENENAME", sep = ""))))
>    accno <- simpleaffy:::.strip.list(mget(x, envir = get(paste(cdfname,
> "ACCNUM", sep = ""))))
>    uni <- simpleaffy:::.strip.list(mget(x, envir = get(paste(cdfname,
> "UNIGENE", sep = ""))))
> 
>    ok <- (symb != "NoAnno") & (desc != "NoAnno") & (accno !=
>        "NoAnno") & (uni != "NoAnno")
>    names(ok) <- x
>    if (!ok && verbose) {
>        warning(paste("value for '", names(ok)[ok], "' not found", sep =
> ""), call. = FALSE)
>    }
>    acc.lnk <-
> paste("=HYPERLINK(\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=nucleotide&term=",
> 
>        accno, "\",\"", accno, "\")", sep = "")
>    acc.lnk[!ok] <- "NoAnno"
>    uni.lnk <-
> paste("=HYPERLINK(\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=unigene&term=",
> 
>        uni, "&dopt=unigene\",\"", uni, "\")", sep = "")
>    uni.lnk[!ok] <- "NoAnno"
>    res <- cbind(symb, acc.lnk, uni.lnk, desc)
>    res[res == "NoAnno"] <- "No Annotation Found"
>    colnames(res) <- c("gene name", "accession", "unigene", "description")
>    return(res)
> }
> #<environment: namespace:simpleaffy>
> 
> name = "get.annotation"
> env = getNamespace("simpleaffy")
> pkgName = "simpleaffy"
> value = get.annotation
> unlockBinding(name, env);
> assignInNamespace(name, value, ns=pkgName, envir=env);
> assign(name, value, envir=env);
> lockBinding(name, env);
> ---
> 
> The additional lines at the bottom are to put the newly specified
> get.annotation function into the namespace for simpleaffy.
> 
> 
> 
> Groot, Philip de wrote:
>> Dear Tony,
>>  
>> Yes, I was already afraid for this. simpleaffy is not updated to
>> properly handle the .db annotation packages. In the past (5 R-versions
>> ago), a annotation library was loaded by the command:  
>> library(hgu133plus2)
>>  
>> However, nowadays it should be loaded by:
>> library(hgu133plus2.db)
>>  
>> Unfortunately, things go wrong in the get.annotation function
>> (simpleaffy), which reads like this:
>>  
>>> get.annotation
>>>     
>> function (x, cdfname, verbose = FALSE) {
>>     library(cdfname, character.only = TRUE)
>>     symb <- .strip.list(mget(x, envir = get(paste(cdfname, "SYMBOL",
>>         sep = "")), ifnotfound = list(.if.probeset.not.found)))
>>     desc <- .strip.list(mget(x, envir = get(paste(cdfname, "GENENAME",
>>         sep = "")), ifnotfound = list(.if.probeset.not.found)))
>>     accno <- .strip.list(mget(x, envir = get(paste(cdfname, "ACCNUM",
>>         sep = "")), ifnotfound = list(.if.probeset.not.found)))
>>     uni <- .strip.list(mget(x, envir = get(paste(cdfname, "UNIGENE",
>>         sep = "")), ifnotfound = list(.if.probeset.not.found)))
>>     ok <- (symb != "NoAnno") & (desc != "NoAnno") & (accno !=        
>> "NoAnno") & (uni != "NoAnno")
>>     names(ok) <- x
>>     if (!ok && verbose) {
>>         warning(paste("value for '", names(ok)[ok], "' not found",
>>             sep = ""), call. = FALSE)
>>     }
>>     acc.lnk <-
>> paste("=HYPERLINK(\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=nucleotide&term=",
>>         accno, "\",\"", accno, "\")", sep = "")
>>     acc.lnk[!ok] <- "NoAnno"
>>     uni.lnk <-
>> paste("=HYPERLINK(\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=unigene&term=",
>>         uni, "&dopt=unigene\",\"", uni, "\")", sep = "")
>>     uni.lnk[!ok] <- "NoAnno"
>>     res <- cbind(symb, acc.lnk, uni.lnk, desc)
>>     res[res == "NoAnno"] <- "No Annotation Found"
>>     colnames(res) <- c("gene name", "accession", "unigene",
>> "description")
>>     return(res)
>> }
>> <environment: namespace:simpleaffy>
>>
>> The top line of this function loads the library (without .db
>> extension) and hence either loading the library will fail or getting
>> the annotation will fail. Copying this text into a new function and
>> deleting the top-line will fix your problem. Provided that you load
>> the .db library first.
>>  
>> Regards,
>>  
>>  
>> Dr. Philip de Groot Ph.D.
>> Bioinformatics Researcher
>>
>> Wageningen University / TIFN
>> Nutrigenomics Consortium
>> Nutrition, Metabolism & Genomics Group
>> Division of Human Nutrition
>> PO Box 8129, 6700 EV Wageningen
>> Visiting Address: Erfelijkheidsleer: De Valk, Building 304
>> Dreijenweg 2, 6703 HA  Wageningen
>> Room: 0052a
>> T: +31-317-485786
>> F: +31-317-483342
>> E-mail:   Philip.deGroot at wur.nl <mailto:Philip.deGroot at wur.nl>
>> Internet: http://www.nutrigenomicsconsortium.nl
>> <http://www.nutrigenomicsconsortium.nl/>             
>> http://humannutrition.wur.nl <http://humannutrition.wur.nl/>
>>              https://madmax.bioinformatics.nl
>> <https://madmax.bioinformatics.nl/>  
>>  
>>  
>>
>> ________________________________
>>
>> From: Tony McBryan [mailto:tony at mcbryan.co.uk]
>> Sent: Mon 14-12-2009 12:35
>> To: Groot, Philip de; bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] hgu133plus2cdfSYMBOL not found
>>
>>
>>
>> Hello Philip,
>>
>> I have applied the change as you suggested but I'm afraid it still fails
>> at the same location; although the content of the error message has
>> changed slightly and now reads:
>>
>> ---
>> Error in library(cdfname, character.only = TRUE) :
>>   there is no package called 'hgu133plus2'
>> ---
>>
>> Thank you for your previous extremely quick response,
>>
>> Tony
>>
>> Groot, Philip de wrote:
>>  
>>> Hello Tony,
>>>
>>> object 'hgu133plus2cdfSYMBOL'  should be object 'hgu133plus2SYMBOL' .
>>>
>>> Please use:
>>> summary <- results.summary(pc,cleancdfname(cdfName(raw.data),
>>> addcdf=FALSE))
>>>                                                                                                                 
>>> ^^^^^^^^^^^^^^
>>>
>>> Regards,
>>>
>>> Dr. Philip de Groot Ph.D.
>>> Bioinformatics Researcher
>>>
>>> Wageningen University / TIFN
>>> Nutrigenomics Consortium
>>> Nutrition, Metabolism & Genomics Group
>>> Division of Human Nutrition
>>> PO Box 8129, 6700 EV Wageningen
>>> Visiting Address: Erfelijkheidsleer: De Valk, Building 304
>>> Dreijenweg 2, 6703 HA  Wageningen
>>> Room: 0052a
>>> T: +31-317-485786
>>> F: +31-317-483342
>>> E-mail:   Philip.deGroot at wur.nl <mailto:Philip.deGroot at wur.nl>
>>> Internet: http://www.nutrigenomicsconsortium.nl
>>> <http://www.nutrigenomicsconsortium.nl/> 
>>> <http://www.nutrigenomicsconsortium.nl/>
>>>              http://humannutrition.wur.nl
>>> <http://humannutrition.wur.nl/>  <http://humannutrition.wur.nl/>
>>>              https://madmax.bioinformatics.nl
>>> <https://madmax.bioinformatics.nl/>  <https://madmax.bioinformatics.nl/>
>>>
>>>
>>>
>>>
>>> ________________________________
>>>
>>> From: Tony McBryan [mailto:tony at mcbryan.co.uk]
>>> Sent: Mon 14-12-2009 11:49
>>> To: bioconductor at stat.math.ethz.ch
>>> Subject: [BioC] hgu133plus2cdfSYMBOL not found
>>>
>>>
>>>
>>> Hello list,
>>>
>>> I'm having trouble using part of the affy packages within Bio conductor.
>>> I have a dozen U133Plus2 microarrays I'm doing a differential analysis
>>> on. Everything works fine until the pairwise comparison stage where I am
>>> unable to perform a summary (results.summary()) of the results of the
>>> comparison. The script terminates with the error message:
>>>
>>> ---
>>> Error in get(paste(cdfname, "SYMBOL", sep = "")) :
>>> object 'hgu133plus2cdfSYMBOL' not found
>>> ---
>>>
>>> as a result of the call to:
>>>
>>> summary <- results.summary(pc,cleancdfname(cdfName(raw.data)))
>>>
>>> where pc is the result of "pc <- pairwise.comparison(x.rma, "treatment",
>>> spots=raw.data )".
>>>
>>> The only result I could find on Google was a previous posting to this
>>> list [1] from 2006 which stated that the "hgu133plus2" package was
>>> required however this seems unavailable from the package repositories:
>>>
>>> ---
>>>  > source("http://www.bioconductor.org/biocLite.R")
>>>  > biocLite("hgu133plus2")
>>> Using R version 2.10.0, biocinstall version 2.5.8.
>>> Installing Bioconductor version 2.5 packages:
>>> [1] "hgu133plus2"
>>> Please wait...
>>>
>>> Warning in install.packages(pkgs = pkgs, repos = repos, ...) :
>>> argument 'lib' is missing: using
>>> '/home/mcbryan/R/x86_64-pc-linux-gnu-library/2.10'
>>> Warning message:
>>> In getDependencies(pkgs, dependencies, available, lib) :
>>> package 'hgu133plus2' is not available
>>> ---
>>>
>>> I have however attached the "hgu133plus2.db" (which seems to be
>>> hgu133plus2's replacement?) "hgu133plus2cdf" and "hgu133plus2probe"
>>> packages to no additional success.
>>>
>>> Vital stats: 64bit Linux (Ubuntu 9.04), R2.10.0 (should be latest from
>>> "deb http://cran.uk.r-project.org/bin/linux/ubuntu karmic/" repository).
>>> Bioc installed using Bioclite (and all packages updated to latest
>>> versions using "update.packages(repos=biocinstallRepos(), ask=FALSE)").
>>>
>>> I have attached a short script below which reproduces the error for me
>>> as well as the output of running that script (including sessionInfo()).
>>>
>>> If anyone could provide a prod in the right direction it would very much
>>> be appreciated.
>>>
>>> Tony McBryan
>>> Beatson Institute for Cancer Research, UK
>>>
>>> [1] https://stat.ethz.ch/pipermail/bioconductor/2006-January/011746.html
>>>
>>>
>>> ===
>>>
>>> runme.R:
>>>
>>> ## load Bioconductor relevant package
>>> ## limma: define experimental design and perform ratio statistics
>>> ## affy: CEL file manipulations
>>> ## affyQCReport: Quality Control Report
>>>
>>> library(limma)
>>> library(simpleaffy)
>>> library(affy)
>>> library(affyQCReport)
>>>
>>> library(hgu133plus2.db)
>>> library(hgu133plus2cdf)
>>> library(hgu133plus2probe)
>>>
>>> ## set locations for input and output data files
>>>
>>> qcDirectory <- "output/"
>>> outputDirectory <- "output/"
>>>
>>> ## read target file detailing input file names into an object
>>> ## start with defining the name and location of the target file
>>> ## then read data into the targets object
>>>
>>> ## read in CEL files listed in covdesc into raw affy object
>>>
>>> raw.data <- read.affy()
>>>
>>> ## compile Affy AC Report
>>> # Temporarily commented out
>>> # qcFile <- paste(qcDirectory, "affy_qc_report.pdf", sep="")
>>> # QCReport(raw.data, file=qcFile)
>>>
>>> ## MAS5: present/Marginal/Absent analysis
>>> ## writing the output to a file
>>>
>>> x.mas <- call.exprs(raw.data,"mas5")
>>> outputFile <- paste(outputDirectory, "MAS5_output.csv", sep="")
>>> write.exprs(x.mas, outputFile, sep="\t")
>>>
>>> ## RMA: probe summary intensity value
>>>
>>> x.rma <- call.exprs(raw.data,"rma")
>>> outputFile <- paste(outputDirectory, "RMA_output.csv", sep="")
>>> write.exprs(x.rma, outputFile, sep="\t")
>>>
>>> ## Pairwise comparison
>>>
>>> pc <- pairwise.comparison(x.rma, "treatment", spots=raw.data )
>>>
>>> ## ---
>>> ## Fails on next line
>>> ## ---
>>> summary <- results.summary(pc,cleancdfname(cdfName(raw.data)))
>>>
>>> ## This is what I wanted to do next
>>> summaryfile <- paste(outputDirectory,"spreadsheet.csv",summary)
>>> write.annotation(file=summaryfile,summary)
>>>
>>>
>>> ===
>>>
>>> Result of source("runme.R"):
>>>
>>> R version 2.10.0 (2009-10-26)
>>> Copyright (C) 2009 The R Foundation for Statistical Computing
>>> ISBN 3-900051-07-0
>>>
>>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>>> You are welcome to redistribute it under certain conditions.
>>> Type 'license()' or 'licence()' for distribution details.
>>>
>>> Natural language support but running in an English locale
>>>
>>> R is a collaborative project with many contributors.
>>> Type 'contributors()' for more information and
>>> 'citation()' on how to cite R or R packages in publications.
>>>
>>> Type 'demo()' for some demos, 'help()' for on-line help, or
>>> 'help.start()' for an HTML browser interface to help.
>>> Type 'q()' to quit R.
>>>
>>>  > source("runme.R")
>>> Loading required package: affy
>>> Loading required package: Biobase
>>>
>>> Welcome to Bioconductor
>>>
>>> Vignettes contain introductory material. To view, type
>>> 'openVignette()'. To cite Bioconductor, see
>>> 'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>
>>> Loading required package: genefilter
>>> Loading required package: gcrma
>>> Loading required package: xtable
>>> Loading required package: affyPLM
>>> Loading required package: preprocessCore
>>>
>>> Attaching package: 'affyPLM'
>>>
>>>
>>> The following object(s) are masked from package:stats :
>>>
>>> resid,
>>> residuals,
>>> weights
>>>
>>> Loading required package: RColorBrewer
>>> Loading required package: lattice
>>> Loading required package: AnnotationDbi
>>> Loading required package: org.Hs.eg.db
>>> Loading required package: DBI
>>> Background correcting
>>> Normalizing
>>> Calculating Expression
>>> Error in get(paste(cdfname, "SYMBOL", sep = "")) :
>>> object 'hgu133plus2cdfSYMBOL' not found
>>>  > sessionInfo()
>>> R version 2.10.0 (2009-10-26)
>>> x86_64-pc-linux-gnu
>>>
>>> locale:
>>> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
>>> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] hgu133plus2probe_2.5.0 hgu133plus2cdf_2.5.0 hgu133plus2.db_2.3.5
>>> [4] org.Hs.eg.db_2.3.6 RSQLite_0.7-3 DBI_0.2-4
>>> [7] AnnotationDbi_1.8.1 affyQCReport_1.24.0 lattice_0.17-26
>>> [10] RColorBrewer_1.0-2 affyPLM_1.22.0 preprocessCore_1.8.0
>>> [13] xtable_1.5-6 simpleaffy_2.22.0 gcrma_2.18.0
>>> [16] genefilter_1.28.2 affy_1.24.2 Biobase_2.6.1
>>> [19] limma_3.2.1
>>>
>>> loaded via a namespace (and not attached):
>>> [1] affyio_1.14.0 annotate_1.24.0 Biostrings_2.14.8 grid_2.10.0
>>> [5] IRanges_1.4.9 splines_2.10.0 survival_2.35-7 tools_2.10.0
>>>
>>>
>>>
>>>
>>>
>>>  
>>>     
>>
>>
>>
>>
>>
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list