[BioC] writeVcf bug

Richard Pearson rpearson at well.ox.ac.uk
Thu Nov 29 17:44:11 CET 2012


Hi

I've noticed a bug with writeVcf when trying to write out relatively 
simple vcf files, e.g. a file containing only GT in the FORMAT fields. 
The following is a (hopefully) reproducible example based on examples in 
?writeVcf

library(VariantAnnotation)
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
out1.vcf <- tempfile()
out2.vcf <- tempfile()
in1 <- readVcf(fl, "hg19")
writeVcf(in1, out1.vcf)
in2 <- readVcf(fl, "hg19", ScanVcfParam(geno="GT"))
writeVcf(in2, out2.vcf)

This last line gives me:

Error in row(genoMat) :
   a matrix-like object is required as argument to 'row'

Looking at the following code snippet from .makeVcfGeno, it seems 
genoMat is first getting created as a matrix, but then changed to 
something that is not a matrix by "genoMat <- genoMatFlat". The error 
then comes from the "row(genoMat)" on the last line shown. I don't 
really understand what is going on in this code so can't offer a patch 
I'm afraid.

     genoMat <- matrix(unlist(as.list(geno), use.names = FALSE,
         recursive = FALSE), nsub * nrec, length(geno))
     genoMatFlat <- as.character(unlist(genoMat))
     genoMatFlat[is.na(genoMatFlat)] <- "."
     if (is.list(genoMat)) {
         genoMatList <- relist(genoMatFlat, PartitioningByEnd(genoMat))
         genoMatFlat <- .pasteCollapse(genoMatList, ",")
         genoMat <- matrix(genoMatFlat, nrow(genoMat), ncol(genoMat))
     }
     else genoMat <- genoMatFlat
     formatMatPerSub <- matrix(rep(t(formatMat), nsub), nsub *
         nrec, length(geno), byrow = TRUE)
     keep <- !is.na(formatMatPerSub)
     genoListBySub <- seqsplit(genoMat[keep], row(genoMat)[keep])

FWIW, as a temporary fix I made an extra, slightly more complex FORMAT 
field using the following:

geno(in2)[["DUMMY"]] <- matrix(sapply(as.vector(geno(in2)[["GT"]]), 
function(x) list(c(x, x))), ncol=ncol(geno(in2)[["GT"]]), 
dimnames=dimnames(geno(in2)[["GT"]]))
writeVcf(in2, out2.vcf)

I've checked and it seems this bug is also present in latest devel. Any 
chance of a fix?

Thanks

Richard

 > sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C LC_TIME=en_GB.UTF-8        
LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8    
LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C 
LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2       
GenomicRanges_1.10.5 IRanges_1.16.4          BiocGenerics_0.4.0 
devtools_0.8            BiocInstaller_1.8.3

loaded via a namespace (and not attached):
  [1] AnnotationDbi_1.20.3   Biobase_2.18.0 biomaRt_2.14.0         
bitops_1.0-5 BSgenome_1.26.1        DBI_0.2-5 digest_0.6.0           
evaluate_0.4.2 GenomicFeatures_1.10.1 httr_0.2
[11] memoise_0.1            parallel_2.15.1 plyr_1.7.1             
RCurl_1.95-3 RSQLite_0.11.2         rtracklayer_1.18.1 
stats4_2.15.1          stringr_0.6.1 tools_2.15.1           whisker_0.1
[21] XML_3.95-0.1           zlibbioc_1.4.0



More information about the Bioconductor mailing list