[BioC] ReadVcf Memory Issues

Martin Morgan mtmorgan at fhcrc.org
Fri Jul 20 05:05:40 CEST 2012


On 07/19/2012 05:59 PM, Timothy Duff wrote:
> Hi. I am trying to determine, from a filtered set of Ilumina 450k probes,
> which of them occur with a specified frequency in a given population. I was
> referred to the Variant Annotation package by this mailing list. While the
> readVcf function seems to handle small loads nicely, looping over the
> regions of interest seems to cause allocation troubles. R tells me "Realloc
> could not re-allocate memory (0 bytes)" after about 4 iterations. Below is

for this, I think it is a bug in the release version of 
VariantAnnotation, and that it is fixed in devel v. 1.3.6 (current devel 
version is 1.3.16) and will be fixed in release version 1.2.10, probably 
built Saturday morning, 10am Seattle time. The short-term solution is to 
switch to using the devel branch 
(http://bioconductor.org/developers/useDevel/), but the bug might be 
avoided anyway by re-coding as suggested by Vince.

Martin

> the relevent code, and below it the output of sessionInfo(). If anyone
> might sugget some diagnostic measures or an alternate way of doing this I
> would appreciate it. Thanks.
>
> ------
>
> library(VariantAnnotation)
> library(IlluminaHumanMethylation450kprobe)
>
> filename <- "
> ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz
> "
>
> load("rsids.Rdata") # a data frame containing probe id, rs id, and
> chromosome
> data(IlluminaHumanMethylation450kprobe)
> colnames(rsids) <- c("Probe_ID", "RS_ID", "CHR")
> m <- merge(IlluminaHumanMethylation450kprobe, rsids, by="Probe_ID")
>
> filename <- "
> ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz
> "
>
> for (i in 1:length(m$Probe_ID)) {
>    snprange <- readVcf(TabixFile(filename), "hg19",
> param=GRanges(as.character(m$CHR[i]), IRanges(as.integer(m$start[i]),
> as.integer(m$end[i]))))
>    freq <- elementMetadata(info(snprange))["EUR_R2"][1,1]
>    if (is.na(freq) == FALSE & freq < .99 & freq > .01) {
>      m$CpGs[i] <- 1
>      }
>    else {
>      m$CpGs[i] <- 0
>    }
> }
>
>
> ----
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] IlluminaHumanMethylation450kprobe_2.0.6
>   [2] AnnotationDbi_1.18.1
>   [3] Biobase_2.16.0
>   [4] BiocInstaller_1.4.7
>   [5] VariantAnnotation_1.2.9
>   [6] Rsamtools_1.8.5
>   [7] Biostrings_2.24.1
>   [8] GenomicRanges_1.8.7
>   [9] IRanges_1.14.4
> [10] BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.2.0         bitops_1.0-4.1        BSgenome_1.24.0
>   [4] DBI_0.2-5             GenomicFeatures_1.8.2 grid_2.15.1
>   [7] lattice_0.20-6        Matrix_1.0-6          RCurl_1.4-3
> [10] RSQLite_0.9-2         rtracklayer_1.16.3    snpStats_1.6.0
> [13] splines_2.15.1        stats4_2.15.1         survival_2.36-14
> [16] tools_2.15.1          XML_3.9-4             zlibbioc_1.2.0
>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list