[BioC] Bug in GSVA with methods other than default?

Fri Feb 21 16:08:37 CET 2014

hi Jonathan cc Sonja,

you're right, this is a bug occuring when the input is an 
'ExpressionSet' object and the argument 'method' as a value different 
than 'gsva'.

i've just submitted a fix to both, the devel and the release versions of 
GSVA, they should become available via biocLite() in the following 24/48 
hrs as versions 1.11.4 and 1.10.3, respectively.

in the meantime, you can always check out directly from the 
corresponding svn repository (current release or devel) the updated 
source and build it yourself; see

http://master.bioconductor.org/developers/how-to/source-control

sorry for the inconvenience,

robert.

On 02/21/2014 03:42 PM, Jonathan Manning wrote:
> Hi Sonja,
>
> No, that's not it.
>
> The probe IDs map perfectly fine through the annotation package and
> GSEABase::mapIdentifiers() etc, and everything works fine if I hack
> around the bug I described previously. Again, what you have is some code
> (e.g. /'Biobase::exprs(eScoEset) <- eSco$es.obs) /), which assumes that
> a list is returned by /GSVA:::.gsva()/. In the case of "
> /method='plage'/ " et al, it is not. If you tweak things with that in
> mind, there's no problem.
>
> Here is the code for gsva() for the ExpressionSet signature (retrieved
> with /getMethod(gsva, signature=c(expr="ExpressionSet",
> gset.idx.list="GeneSetCollection", annotation="missing"))/ ), with my
> hacks included in bold:
>
>
> function (expr, gset.idx.list, annotation = NULL, ...)
> {
> .local <- function (expr, gset.idx.list, annotation = NULL,
> method = c("gsva", "ssgsea", "zscore", "plage"), rnaseq = FALSE,
> abs.ranking = FALSE, min.sz = 1, max.sz = Inf, no.bootstraps = 0,
> bootstrap.percent = 0.632, parallel.sz = 0, parallel.type = "SOCK",
> mx.diff = TRUE, tau = switch(method, gsva = 1, ssgsea = 0.25,
> NA), kernel = TRUE, verbose = TRUE)
> {
> sdGenes <- Biobase::esApply(expr, 1, sd)
> if (any(sdGenes == 0)) {
> if (verbose)
> cat("Filtering out", sum(sdGenes == 0), "genes with constant expression
> values throuhgout the samples\n")
> expr <- expr[sdGenes > 0, ]
> }
> if (nrow(expr) < 2)
> stop("Less than two genes in the input ExpressionSet object\n")
> if (verbose)
> cat("Mapping identifiers between gene sets and feature names\n")
> mapped.gset.idx.list <- GSEABase::mapIdentifiers(gset.idx.list,
> GSEABase::AnnoOrEntrezIdentifier(Biobase::annotation(expr)))
> tmp <- lapply(geneIds(mapped.gset.idx.list), function(x,
> y) na.omit(match(x, y)), featureNames(expr))
> names(tmp) <- names(mapped.gset.idx.list)
> mapped.gset.idx.list <- filterGeneSets(tmp, min.sz = max(1, min.sz),
> max.sz = max.sz)
> eSco <- GSVA:::.gsva(Biobase::exprs(expr), mapped.gset.idx.list,
> method, rnaseq, abs.ranking, no.bootstraps, bootstrap.percent,
> parallel.sz, parallel.type, mx.diff, tau, kernel,
> verbose)
> eScoEset <- expr
> #Biobase::exprs(eScoEset) <- eSco$es.obs
> * if (method=='gsva'){
> Biobase::exprs(eScoEset) <- eSco$es.obs
> }else{
> Biobase::exprs(eScoEset) <- eSco
> }*
>
> Biobase::annotation(eScoEset) <- ""
> #return(list(es.obs = eScoEset, bootstrap = eSco$bootstrap,
> # p.vals.sign = eSco$p.vals.sign))
> *return (eScoEset)*
> }
> .local(expr, gset.idx.list, annotation, ...)
> }
>
> Since you say the bootstrapping isn't much use anyway, I've just set it
> to return the matrix in all cases.
>
> Jon
>
>
>
>
> On 21/02/2014 14:05, Sonja Haenzelmann wrote:
>> I think the problem lies in your expression set.
>> The gene names have to be in ENTREZ gene Id format, as the c2BroadSets
>> are in this format. gsva maps the genes in your expression set with
>> the genes in the gene set list. If they do not have the same
>> identifiers it cannot map the genes, and hence not calculate any
>> statistics.
>>
>>
>> library(GSVAdata)
>> data(c2BroadSets)
>> library(GSVA)
>>
>>
>> example<- as.matrix(read.table("example.txt"))
>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2',
>> 5))))
>> rownames(pdata_example)<- letters[1:10]
>> pData(eset_example)<- pdata_example
>>
>>
>>> featureNames(eset_example)
>> [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229"
>> "ILMN_1806310"
>> [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025"
>> "ILMN_1814316"
>>
>> you will have to map your ILMN ids to entrez gene ids, then gsva
>> should be working.
>>
>>
>> For example in the toy example that shows up, when you do ?gsva, you
>> will find that the gene names are the same in the gene set list as
>> well as in the gene expression matrix.
>>
>> p<- 10 ## number of genes
>> n<- 30 ## number of samples
>> nGrp1<- 15 ## number of samples in group 1
>> nGrp2<- n - nGrp1 ## number of samples in group 2
>>
>> ## consider three disjoint gene sets
>> geneSets<- list(set1=paste("g", 1:3, sep=""),
>> set2=paste("g", 4:6, sep=""),
>> set3=paste("g", 7:10, sep=""))
>>
>>
>>
>>> geneSets
>> $set1
>> [1] "g1" "g2" "g3"
>>
>> $set2
>> [1] "g4" "g5" "g6"
>>
>> $set3
>> [1] "g7" "g8" "g9" "g10"
>>
>>
>> ## sample data from a normal distribution with mean 0 and st.dev. 1
>> y<- matrix(rnorm(n*p), nrow=p, ncol=n,
>> dimnames=list(paste("g", 1:p, sep="") , paste("s",
>> 1:n, sep="")))
>>
>> rownames(y)
>> [1] "g1" "g2" "g3" "g4" "g5" "g6" "g7" "g8" "g9" "g10"
>>
>> Best
>> Sonja
>>
>>
>>
>>
>> On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning
>> <Jonathan.Manning at ed.ac.uk> wrote:
>>> example<- as.matrix(read.table("example.txt"))
>>> eset_example<- ExpressionSet(example, annotation='illuminaHumanv4')
>>> pdata_example<- data.frame(class=c(rep('class1', 5), c(rep('class2',
>>> 5))))
>>> rownames(pdata_example)<- letters[1:10]
>>> pData(eset_example)<- pdata_example
>>> gsva_es<- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage')
>
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550