[BioC] VariantAnnotation ALT Field

Paul Shannon pshannon at fhcrc.org
Wed Nov 21 18:19:37 CET 2012


Hi Sam,

Here's a quick workaround:

  fixed(vcf)[ , c("REF", "ALT")]

The backstory on this is that the ALT field is a DNAStringSetList which, until very recently (the change is in bioc-devel) displayed itself, via its show methods, as '######'.  Realizing this was somewhat less than helpful, the latest version of VariantAnnotation display the alt sequence in a more natural way.

But in the meantime, and if you do not use bioc devel, the explicit extraction of REF and ALT demonstrated above should get you part of what you want.

 - Paul


On Nov 21, 2012, at 6:50 AM, Samuel Younkin wrote:

> I have been looking at the VariantAnnotation vignette and have encountered something strange.  The R code is below.  See how the ALT field lists only ########.  The vignette, however, correctly shows the alternate allele.  The data file chr22.vcf.gz also correctly contains the alternate allele information.
> 
> Any suggestions?
> 
> Thanks.
> 
> Sam
> 
> ~~
> 
> > library(VariantAnnotation)
> > fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
> > vcf <- readVcf(fl, "hg19")
> > head( fixed(vcf), 3 )
> GRanges with 3 ranges and 5 metadata columns:
>              seqnames               ranges strand | paramRangeID
>                 <Rle>            <IRanges>  <Rle> |     <factor>
>    rs7410291       22 [50300078, 50300078]      * |         <NA>
>  rs147922003       22 [50300086, 50300086]      * |         <NA>
>  rs114143073       22 [50300101, 50300101]      * |         <NA>
>                         REF                ALT      QUAL      FILTER
>              <DNAStringSet> <DNAStringSetList> <numeric> <character>
>    rs7410291              A           ########       100        PASS 
>  rs147922003              C           ########       100        PASS 
>  rs114143073              G           ########       100        PASS 
>  ---
>  seqlengths:
>   22
>   NA
> > sessionInfo()
> R version 2.15.2 Patched (2012-10-28 r61038)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.iso885915       LC_NUMERIC=C
> [3] LC_TIME=en_US.iso885915        LC_COLLATE=en_US.iso885915
> [5] LC_MONETARY=en_US.iso885915    LC_MESSAGES=en_US.iso885915
> [7] LC_PAPER=C                     LC_NAME=C
> [9] LC_ADDRESS=C                   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.iso885915 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
> 
> other attached packages:
> [1] VariantAnnotation_1.4.5 Rsamtools_1.10.2        Biostrings_2.26.2
> [4] GenomicRanges_1.10.5    IRanges_1.16.4          BiocGenerics_0.4.0
> 
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.20.3   Biobase_2.18.0         biomaRt_2.14.0
> [4] bitops_1.0-5           BSgenome_1.26.1        DBI_0.2-5
> [7] GenomicFeatures_1.10.1 parallel_2.15.2        RCurl_1.95-3
> [10] RSQLite_0.11.2         rtracklayer_1.18.1     stats4_2.15.2
> [13] tools_2.15.2           XML_3.95-0.1           zlibbioc_1.4.0
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list