[BioC] nsFilter error in genefilter

James W. MacDonald jmacdon at uw.edu
Wed Apr 17 18:23:49 CEST 2013


Hi Zhenya,

On 4/17/2013 12:02 PM, Zhenya [guest] wrote:
> Hi All,
>
> I am trying to run the code for GSVA (library with the same name). The code is below, but the main error is around annotation:
>> source("http://bioconductor.org/biocLite.R")
> Bioconductor version 2.12 (BiocInstaller 1.10.0), ?biocLite for help
>> biocLite("hthgu133pluspm.db")

There is no such package. You could easily create one yourself using the 
AnnotationForge package (see the vignette). Or you could note that the 
hthgu133pluspm array has identical content as the hgu133plus2 array, 
except for a few extra control probesets, and the fact that they 
insisted on adding an extra _PM to all the probesets.

 > sum(ls(hgu133plus2cdf) %in% gsub("_PM","", ls(hthgu133pluspmcdf)))
[1] 54675
 > length(ls(hgu133plus2cdf))
[1] 54675
 > length(ls(hthgu133pluspmcdf))
[1] 54715
 > ls(hthgu133pluspmcdf)[!gsub("_PM","", ls(hthgu133pluspmcdf)) %in% 
ls(hgu133plus2cdf)]
[1] "AFFX-NonspecificGC10_at" "AFFX-NonspecificGC11_at"
[3] "AFFX-NonspecificGC12_at" "AFFX-NonspecificGC13_at"
[5] "AFFX-NonspecificGC14_at" "AFFX-NonspecificGC15_at"
[7] "AFFX-NonspecificGC16_at" "AFFX-NonspecificGC17_at"
[9] "AFFX-NonspecificGC18_at" "AFFX-NonspecificGC19_at"
[11] "AFFX-NonspecificGC20_at" "AFFX-NonspecificGC21_at"
[13] "AFFX-NonspecificGC22_at" "AFFX-NonspecificGC23_at"
[15] "AFFX-NonspecificGC24_at" "AFFX-NonspecificGC25_at"
[17] "AFFX-NonspecificGC3_at" "AFFX-NonspecificGC4_at"
[19] "AFFX-NonspecificGC5_at" "AFFX-NonspecificGC6_at"
[21] "AFFX-NonspecificGC7_at" "AFFX-NonspecificGC8_at"
[23] "AFFX-NonspecificGC9_at" "AFFX-r2-TagA_at"
[25] "AFFX-r2-TagB_at" "AFFX-r2-TagC_at"
[27] "AFFX-r2-TagD_at" "AFFX-r2-TagE_at"
[29] "AFFX-r2-TagF_at" "AFFX-r2-TagG_at"
[31] "AFFX-r2-TagH_at" "AFFX-r2-TagIN-3_at"
[33] "AFFX-r2-TagIN-5_at" "AFFX-r2-TagIN-M_at"
[35] "AFFX-r2-TagJ-3_at" "AFFX-r2-TagJ-5_at"
[37] "AFFX-r2-TagO-3_at" "AFFX-r2-TagO-5_at"
[39] "AFFX-r2-TagQ-3_at" "AFFX-r2-TagQ-5_at"

So you could either go to the trouble of building and installing a .db 
package for this array, or you could do something like

featureNames(EsetData) <- gsub("_PM","", featureNames(EsetData))
annotation(EsetData) <- "hgu133plus2.db"

and carry on as before.

Best,

Jim


> BioC_mirror: http://bioconductor.org
> Using Bioconductor version 2.12 (BiocInstaller 1.10.0), R version 3.0.0.
> Installing package(s) 'hthgu133pluspm.db'
> Warning message:
> package ‘hthgu133pluspm.db’ is not available (for R version 3.0.0)
>
> Code:
>
> # CREATE GeneSetCollection
> library(GSEABase)
> x<- scan("GeneSets.gmt", what="", sep="\n")
> GeneSets.gmt<- strsplit(x, "[[:space:]]+")
> names(GeneSets.gmt)<- sapply(GeneSets.gmt, `[[`, 1)
> GeneSets.gmt<- lapply(GeneSets.gmt, `[`, -1)
> n<- names(GeneSets.gmt)
> uniqueList<- lapply(GeneSets.gmt, unique)
> makeSet<- function(geneIds, n) {GeneSet(geneIds, geneIdType=SymbolIdentifier(), setName=n)}
> gsList<- gsc<- mapply(makeSet, uniqueList[], n)
> gsc<- GeneSetCollection(gsList)
>
> # DATASET
> # CREATE ExpressionSet
> exprs<- as.matrix(read.table("ExprData.txt", header=TRUE, sep="\t", row.names=1, as.is=TRUE))
> pData<- read.table("DesignFile.txt",row.names=1, header=T,sep="\t")
> phenoData<- new("AnnotatedDataFrame",data=pData)
> annotation<- "hthgu133pluspm.db"
> EsetData<- ExpressionSet(assayData=exprs,phenoData=phenoData,annotation="hthgu133pluspm")
> head(ExprData)
>
> #Gene Filtering
> library(genefilter)
> library("hthgu133pluspm")
> filtered_eset<- nsFilter(EsetData, require.entrez=TRUE, remove.dupEntrez=TRUE, var.func=IQR, var.filter=FALSE, var.cutoff=0.25, filterByQuantile=TRUE, feature.exclude="^AFFX")
> # get stats for numbers of probesets removed
> filtered_eset
> EsetData_f<- filtered_eset$eset
>
> # GSVA
> library(GSVA)
> gsva_es<- gsva(EsetData_f,gsc,abs.ranking=FALSE,min.sz=1,max.sz=1000,mx.diff=TRUE)$es.obs
>
> I downloaded hthgu133pluspm from http://nmg-r.bioinformatics.nl/NuGO_R.html
> and R still complains. The available on Bioconductor:
> hthgu133pluspmprobe
> and
> hthgu133pluspmcdf
> are not correct and give error for nsFilter and gsva:
> Error in (function (classes, fdef, mtable)  :
>    unable to find an inherited method for function ‘cols’ for signature ‘"environment"’
>
> Mapping identifiers between gene sets and feature names
> Error in GeneSetCollection(lapply(what, mapIdentifiers, to, ..., verbose = verbose)) :
>    error in evaluating the argument 'object' in selecting a method for function 'GeneSetCollection': Error in (function (classes, fdef, mtable)  :
>    unable to find an inherited method for function ‘cols’ for signature ‘"environment"’
>
>
> Thank you,
> Zhenya
>
>   -- output of sessionInfo():
>
> R version 3.0.0 (2013-04-03)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] GSVA_1.8.0                 BiocInstaller_1.10.0       hthgu133pluspmprobe_2.12.0 hthgu133pluspmcdf_2.12.0   genefilter_1.42.0          GSEABase_1.22.0
>   [7] graph_1.38.0               annotate_1.38.0            AnnotationDbi_1.22.1       Biobase_2.20.0             BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
> [1] DBI_0.2-5       IRanges_1.18.0  RSQLite_0.11.2  splines_3.0.0   stats4_3.0.0    survival_2.37-4 tools_3.0.0     XML_3.96-1.1    xtable_1.7-1
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list