[BioC] Printing Alt alleles using VariantAnnotation

Valerie Obenchain vobencha at fhcrc.org
Sat Sep 29 02:31:51 CEST 2012


As a heads up, the behavior of the ref(), alt(), qual() and filt() 
accessors have changed in the devel version of VariantAnnotation.

Now instead of

values(fixed(vcf))[["ALT"]]))

you can simply

alt(vcf)

This now returns the single value instead of a GRanges with the value as 
an elementMetadata column.  Hopefully this makes getting at these data 
easier.

Valerie


On 09/28/2012 07:44 AM, James W. MacDonald wrote:
> Hi Mark,
>
> On 9/28/2012 6:06 AM, Mark Dunning wrote:
>> Hi all,
>>
>> I am doing some processing of vcf files using the VariantAnnotation
>> package, and eventually I want to write out a table that I can use the
>> annovar annotation package tool on
>> (http://www.openbioinformatics.org/annovar/). The table needs to be in
>> the form
>>
>> CHR, Start, end, Ref, Alt
>>
>> e.g.
>>
>> 1 55 55 T G
>> 1 2646 2646 G A
>>
>> I'm fine extracting the chromosome, start and end. To get the
>> referrence alleles I do.
>>
>>> Ref<- as.data.frame(values(ref(vcf))[["REF"]])[,1]
>> But the Alt allele is a bit more complicated. If I do something like;
>>
>>> alternate = as.data.frame(unlist(values(fixed(vcf))[["ALT"]]))[,1]
>
> How about
>
> alternate <- sapply(values(fixed(vcf))[["ALT"]], paste, collapse = ",")
>
> Best,
>
> Jim
>
>
>> The number of rows could be greater than the number of variants in the
>> vcf file, especially for indels where more than one alternate allele
>> could be found. I can no longer easily construct the data frame.
>>
>> Is there an easy way to write all alternate alleles for the same
>> position in a comma-separated string so that entries in the table
>> could be in the form
>>
>> 1 55 55 T G,C
>> (e,g,  G and C alternate alleles were found for the SNP at position
>> chromosome 1: 55-55)
>>
>>
>> Regards,
>>
>> Mark
>>
>>
>>> sessionInfo()
>> R version 2.15.1 (2012-06-22)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] VariantAnnotation_1.2.11 Rsamtools_1.8.6          Biostrings_2.24.1
>> [4] ggplot2_0.9.2.1          GenomicRanges_1.8.13     IRanges_1.14.4
>> [7] BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>>   [1] AnnotationDbi_1.18.3  Biobase_2.16.0        biomaRt_2.12.0
>>   [4] bitops_1.0-4.1        BSgenome_1.24.0       colorspace_1.1-1
>>   [7] DBI_0.2-5             dichromat_1.2-4       digest_0.5.2
>> [10] GenomicFeatures_1.8.3 grid_2.15.1           gtable_0.1.1
>> [13] labeling_0.1          lattice_0.20-10       MASS_7.3-21
>> [16] Matrix_1.0-9          memoise_0.1           munsell_0.4
>> [19] plyr_1.7.1            proto_0.3-9.2         RColorBrewer_1.0-5
>> [22] RCurl_1.91-1          reshape2_1.2.1        RSQLite_0.11.2
>> [25] rtracklayer_1.16.3    scales_0.2.2          snpStats_1.6.0
>> [28] splines_2.15.1        stats4_2.15.1         stringr_0.6.1
>> [31] survival_2.36-14      tools_2.15.1          XML_3.9-4
>> [34] zlibbioc_1.2.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list