[BioC] Bug in GSVA with methods other than default?

Sonja Haenzelmann sonjahaenzelmann at gmail.com
Fri Feb 21 15:05:06 CET 2014


I think the problem lies in your expression set.
The gene names have to be in ENTREZ gene Id format, as the c2BroadSets
are in this format. gsva maps the genes in your expression set with
the genes in the gene set list. If they do not have the same
identifiers it cannot map the genes, and hence not calculate any
statistics.


library(GSVAdata)
data(c2BroadSets)
library(GSVA)


example <- as.matrix(read.table("example.txt"))
eset_example <- ExpressionSet(example, annotation='illuminaHumanv4')
pdata_example <- data.frame(class=c(rep('class1', 5), c(rep('class2', 5))))
rownames(pdata_example) <- letters[1:10]
pData(eset_example) <- pdata_example


> featureNames(eset_example)
 [1] "ILMN_1762337" "ILMN_2055271" "ILMN_1736007" "ILMN_2383229" "ILMN_1806310"
 [6] "ILMN_1779670" "ILMN_1653355" "ILMN_1717783" "ILMN_1705025" "ILMN_1814316"

you will have to map your ILMN ids to entrez gene ids, then gsva
should be working.


For example in the toy example that shows up, when you do ?gsva, you
will find that the gene names are the same in the gene set list as
well as in the gene expression matrix.

p <- 10 ## number of genes
     n <- 30 ## number of samples
     nGrp1 <- 15 ## number of samples in group 1
     nGrp2 <- n - nGrp1 ## number of samples in group 2

     ## consider three disjoint gene sets
     geneSets <- list(set1=paste("g", 1:3, sep=""),
                      set2=paste("g", 4:6, sep=""),
                      set3=paste("g", 7:10, sep=""))



> geneSets
$set1
[1] "g1" "g2" "g3"

$set2
[1] "g4" "g5" "g6"

$set3
[1] "g7"  "g8"  "g9"  "g10"


     ## sample data from a normal distribution with mean 0 and st.dev. 1
     y <- matrix(rnorm(n*p), nrow=p, ncol=n,
                 dimnames=list(paste("g", 1:p, sep="") , paste("s",
1:n, sep="")))

rownames(y)
 [1] "g1"  "g2"  "g3"  "g4"  "g5"  "g6"  "g7"  "g8"  "g9"  "g10"

Best
Sonja




On Fri, Feb 21, 2014 at 2:39 PM, Jonathan Manning
<Jonathan.Manning at ed.ac.uk> wrote:
> example <- as.matrix(read.table("example.txt"))
> eset_example <- ExpressionSet(example, annotation='illuminaHumanv4')
> pdata_example <- data.frame(class=c(rep('class1', 5), c(rep('class2', 5))))
> rownames(pdata_example) <- letters[1:10]
> pData(eset_example) <- pdata_example
> gsva_es <- gsva(eset_example, c2BroadSets , mx.diff=1, method='plage')



More information about the Bioconductor mailing list