[BioC] [GSVA] Segfault with GSVA

Robert Castelo robert.castelo at upf.edu
Wed Jan 16 16:25:38 CET 2013


hi Luca, and everybody else,

the original post was not published at the list but i'm posting it below 
with the solution to the problem.

the problem that is causing the segfault is, next to a bad handling of 
the error, the fact that some genes have a constant expression value 
across all samples, leading to no variability:

dim(matrix)
[1] 999  10
sdsgenes <- apply(matrix, 1, sd)
summary(sdsgenes)
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.00000 0.00000 0.01776 0.10120 0.10220 2.83900
sum(sdsgenes == 0)
[1] 320

as you can see, 320 out of the 999 have standard deviation zero, if you 
do the calculations filtering those genes out, things will work:

x <- gsva(expr=matrix[sdsgenes > 0, ], gset.idx.list=idx.list)
Estimating GSVA scores for 152 gene sets.
Computing observed enrichment scores
Estimating ECDFs in microarray data with Gaussian kernels
Using parallel with 12 cores
|===============================================================================================| 
100%
dim(x$es.obs)
[1] 152  10

i have added a QC that removes upfront all genes with zero standard 
deviation, reporting this action when the argument verbose=TRUE.

this has been fixed in both GSVA release version 1.6.2 and development 
version 1.7.5 which should be available at the BioC site within the next 
48 hrs.

thanks for the bug reporting!!
robert.


On 01/16/2013 11:49 AM, Luca Beltrame wrote:
> Hello,
>
> I'm a bioinformatician working in a no profit research institute, and recently
> I've been using GSVA in a number of pipelines. However, with some data sets
> (in particular Affymetrix arrays) I encounter a segfault in the C code
> outlined below.
>
> I attached to this mail a section of a matrix that reproduces the problem:
> mouse data, normalized with GCRMA and annotated using M.Dai's custom CDFs.
>
> Thanks in advance.
>
> Test case below:
>
>> library(GSVA)
>> library(GSVAdata)
>> data(c2BroadSets)
>> matrix = as.matrix(read.delim("matrix.txt", row.names=1)) # Mouse data
>> idx.list = geneIds(c2BroadSets) # convert to regular list
>> gsva(expr=matrix, gset.idx.list=idx.list)
> Estimating GSVA scores for 269 gene sets.
> Computing observed enrichment scores
> Estimating ECDFs in microarray data with Gaussian kernels
>
>   *** caught segfault ***
> address 0x7fa467060140, cause 'memory not mapped'
>
> Traceback:
>   1: .C("matrix_density_R", as.double(t(expr[, sample.idxs])),
> as.double(t(expr)),     R = double(n.test.samples * n.genes),
> n.density.samples,     n.test.samples, n.genes, as.integer(rnaseq))
>   2: compute.gene.density(expr, sample.idxs, rnaseq, kernel)
>   3: compute.geneset.es(expr, gset.idx.list, 1:n.samples, rnaseq = rnaseq,
> abs.ranking = abs.ranking, parallel.sz = parallel.sz, parallel.type =
> parallel.type,     mx.diff = mx.diff, tau = tau, kernel = kernel, verbose =
> verbose)
>   4: GSVA:::.gsva(expr, mapped.gset.idx.list, method, rnaseq, abs.ranking,
> no.bootstraps, bootstrap.percent, parallel.sz, parallel.type,     mx.diff,
> tau, kernel, verbose)
>   5: .local(expr, gset.idx.list, annotation, ...)
>   6: gsva(expr = matr, gset.idx.list = geneIds(c2BroadSets))
>   7: gsva(expr = matr, gset.idx.list = geneIds(c2BroadSets))
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-suse-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8
>   [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=it_IT.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] GSVA_1.6.1           GSEABase_1.20.1      graph_1.36.1
> [4] annotate_1.36.0      AnnotationDbi_1.20.3 Biobase_2.18.0
> [7] BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] DBI_0.2-5       IRanges_1.16.4  parallel_2.15.2 RSQLite_0.11.2
> [5] stats4_2.15.2   tools_2.15.2    XML_3.95-0.1    xtable_1.7-1
>
>

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Experimental and Health Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514
fax: +34.933.160.550



More information about the Bioconductor mailing list