[BioC] writeVcf bug

Richard Pearson rpearson at well.ox.ac.uk
Thu Apr 18 12:55:09 CEST 2013


Hi

I've recently upgraded to R 3.0.0, but am still having a (now slightly 
different) problem with writeVcf. When there is only one element in 
geno(vcf), each variant gets a separate row for each sample. For 
example, in the following code, out1.vcf has 5 variants, each with 
genotypes for 3 samples, but out2.vcf has 15 variants, each with 
genotypes for just one sample.

     library(VariantAnnotation)
     fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
     in1 <- readVcf(fl, "hg19")
     writeVcf(in1, "out1.vcf")
     in2 <- readVcf(fl, "hg19", ScanVcfParam(geno="GT"))
     writeVcf(in2, "out2.vcf")

 > sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C LC_TIME=en_GB.UTF-8        
LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8    
LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C 
LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods   
base

other attached packages:
[1] VariantAnnotation_1.6.2 Rsamtools_1.12.1 Biostrings_2.28.0       
GenomicRanges_1.12.2 IRanges_1.18.0          BiocGenerics_0.6.0 
devtools_1.2            BiocInstaller_1.10.0

loaded via a namespace (and not attached):
  [1] AnnotationDbi_1.22.1   Biobase_2.20.0 biomaRt_2.16.0         
bitops_1.0-5           BSgenome_1.28.0 DBI_0.2-5              
digest_0.6.3           evaluate_0.4.3 GenomicFeatures_1.12.0 httr_0.2
[11] memoise_0.1            RCurl_1.95-4.1 RSQLite_0.11.3         
rtracklayer_1.20.1     stats4_3.0.0 stringr_0.6.2          
tools_3.0.0            whisker_0.1 XML_3.96-1.1           zlibbioc_1.6.0

Cheers

Richard


On 29/11/2012 17:36, Richard Pearson wrote:
> Hi Michael
>
> My devel sessionInfo is below. It seems biocLite has given me version
> 1.5.17of the package, but I see that .makeVcfGenolooks different in the
> latest from SVN (1.5.19) - so I'm sure you're right and I just need to
> wait for things to filter through. Apologies for the noise.
>
> Every time I upgrade VariantAnnotation it is an even better package than
> before - many thanks for all the hard work!
>
> Richard
>
>
>   > sessionInfo()
> R Under development (unstable) (2012-11-27 r61172)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>    [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C LC_TIME=en_GB.UTF-8
> LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
> LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C                 LC_NAME=C
>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
> LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils datasets  methods   base
>
> other attached packages:
> [1] VariantAnnotation_1.5.17 Rsamtools_1.11.11 Biostrings_2.27.7
> GenomicRanges_1.11.15 IRanges_1.17.15          BiocGenerics_0.5.1
> devtools_0.8             BiocInstaller_1.9.4
>
> loaded via a namespace (and not attached):
>    [1] AnnotationDbi_1.21.7   Biobase_2.19.1 biomaRt_2.15.0
> bitops_1.0-5 BSgenome_1.27.1        DBI_0.2-5 digest_0.6.0
> evaluate_0.4.2 GenomicFeatures_1.11.5 httr_0.2
> [11] memoise_0.1            plyr_1.7.1 RCurl_1.95-3
> RSQLite_0.11.2 rtracklayer_1.19.6     stats4_2.16.0
> stringr_0.6.1          tools_2.16.0 whisker_0.1            XML_3.95-0.1
> [21] zlibbioc_1.5.0
>
>
> On 29/11/2012 17:22, Michael Lawrence wrote:
>> I am pretty sure this bug has been fixed in devel. Would you mind
>> pasting your devel session info?
>>
>> Thanks,
>> Michael
>>
>>
>> On Thu, Nov 29, 2012 at 8:44 AM, Richard Pearson
>> <rpearson at well.ox.ac.uk <mailto:rpearson at well.ox.ac.uk>> wrote:
>>
>>      Hi
>>
>>      I've noticed a bug with writeVcf when trying to write out
>>      relatively simple vcf files, e.g. a file containing only GT in the
>>      FORMAT fields. The following is a (hopefully) reproducible example
>>      based on examples in ?writeVcf
>>
>>      library(VariantAnnotation)
>>      fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
>>      out1.vcf <- tempfile()
>>      out2.vcf <- tempfile()
>>      in1 <- readVcf(fl, "hg19")
>>      writeVcf(in1, out1.vcf)
>>      in2 <- readVcf(fl, "hg19", ScanVcfParam(geno="GT"))
>>      writeVcf(in2, out2.vcf)
>>
>>      This last line gives me:
>>
>>      Error in row(genoMat) :
>>        a matrix-like object is required as argument to 'row'
>>
>>      Looking at the following code snippet from .makeVcfGeno, it seems
>>      genoMat is first getting created as a matrix, but then changed to
>>      something that is not a matrix by "genoMat <- genoMatFlat". The
>>      error then comes from the "row(genoMat)" on the last line shown. I
>>      don't really understand what is going on in this code so can't
>>      offer a patch I'm afraid.
>>
>>          genoMat <- matrix(unlist(as.list(geno), use.names = FALSE,
>>              recursive = FALSE), nsub * nrec, length(geno))
>>          genoMatFlat <- as.character(unlist(genoMat))
>>          genoMatFlat[is.na <http://is.na>(genoMatFlat)] <- "."
>>          if (is.list(genoMat)) {
>>              genoMatList <- relist(genoMatFlat, PartitioningByEnd(genoMat))
>>              genoMatFlat <- .pasteCollapse(genoMatList, ",")
>>              genoMat <- matrix(genoMatFlat, nrow(genoMat), ncol(genoMat))
>>          }
>>          else genoMat <- genoMatFlat
>>          formatMatPerSub <- matrix(rep(t(formatMat), nsub), nsub *
>>              nrec, length(geno), byrow = TRUE)
>>          keep <- !is.na <http://is.na>(formatMatPerSub)
>>          genoListBySub <- seqsplit(genoMat[keep], row(genoMat)[keep])
>>
>>      FWIW, as a temporary fix I made an extra, slightly more complex
>>      FORMAT field using the following:
>>
>>      geno(in2)[["DUMMY"]] <-
>>      matrix(sapply(as.vector(geno(in2)[["GT"]]), function(x) list(c(x,
>>      x))), ncol=ncol(geno(in2)[["GT"]]),
>>      dimnames=dimnames(geno(in2)[["GT"]]))
>>      writeVcf(in2, out2.vcf)
>>
>>      I've checked and it seems this bug is also present in latest
>>      devel. Any chance of a fix?
>>
>>      Thanks
>>
>>      Richard
>>
>>      > sessionInfo()
>>      R version 2.15.1 (2012-06-22)
>>      Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>>      locale:
>>       [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C LC_TIME=en_GB.UTF-8
>>           LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8
>>       LC_MESSAGES=en_GB.UTF-8 LC_PAPER=C                 LC_NAME=C
>>       [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>      LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>
>>      attached base packages:
>>      [1] stats     graphics  grDevices utils     datasets  methods base
>>
>>      other attached packages:
>>      [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2 Biostrings_2.26.2
>>        GenomicRanges_1.10.5 IRanges_1.16.4          BiocGenerics_0.4.0
>>      devtools_0.8  BiocInstaller_1.8.3
>>
>>      loaded via a namespace (and not attached):
>>       [1] AnnotationDbi_1.20.3   Biobase_2.18.0 biomaRt_2.14.0
>>      bitops_1.0-5 BSgenome_1.26.1        DBI_0.2-5 digest_0.6.0
>>        evaluate_0.4.2 GenomicFeatures_1.10.1 httr_0.2
>>      [11] memoise_0.1            parallel_2.15.1 plyr_1.7.1
>>      RCurl_1.95-3 RSQLite_0.11.2         rtracklayer_1.18.1
>>      stats4_2.15.1          stringr_0.6.1 tools_2.15.1 whisker_0.1
>>      [21] XML_3.95-0.1           zlibbioc_1.4.0
>>
>>      _______________________________________________
>>      Bioconductor mailing list
>>      Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>      https://stat.ethz.ch/mailman/listinfo/bioconductor
>>      Search the archives:
>>      http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



More information about the Bioconductor mailing list