[BioC] [VariantAnnotation] subsetting VCF objects

Valerie Obenchain vobencha at fhcrc.org
Thu Nov 15 03:42:48 CET 2012


Hi Paul,

Thanks for the bug report. Now fixed in VariantAnnotation 1.5.14 in 
devel and 1.4.5 in release. These versions will be available Friday 
(11/16) 9am PST or immediately from svn.

Valerie


On 11/14/12 05:41, Paul Theodor Pyl wrote:
> Hi all,
>
> I am reading in some .vcf files with the readVcf function and realized 
> that I cannot subset the resulting VCF objects if the info field is 
> empty, see example below.
>
> Is there a workaround except for loading the info at least partially?
>
> Thanks,
> Paul
>
> The Example:
> > vcf_full = readVcf("test.vcf.gz", "hg19")
> > vcf_no_info = readVcf("test.vcf.gz", "hg19", param = ScanVcfParam( 
> geno=c("GT","GQ"), fixed="ALT", info=NA ))
> vcf_full
> class: VCF
> dim: 71128 2
> genome: hg19
> exptData(1): header
> fixed(4): REF ALT QUAL FILTER
> info(22): AC AF ... SB STR
> geno(5): AD DP GQ GT PL
> rownames(71128): rs62224610 rs141578542 ... 22:51243743 22:51244332
> rowData values names(1): paramRangeID
> colnames(2): sample_one sample_two
> colData names(1): Samples
> > vcf_no_info
> class: VCF
> dim: 71128 2
> genome: hg19
> exptData(1): header
> fixed(2): REF ALT
> info(0):
> geno(2): GQ GT
> rownames(71128): rs62224610 rs141578542 ... 22:51243743 22:51244332
> rowData values names(1): paramRangeID
> colnames(2): sample_one sample_two
> colData names(1): Samples
> > vcf_full[1:10]
> class: VCF
> dim: 10 2
> genome: hg19
> exptData(1): header
> fixed(4): REF ALT QUAL FILTER
> info(22): AC AF ... SB STR
> geno(5): AD DP GQ GT PL
> rownames(10): rs62224610 rs141578542 ... 22:16058463 rs149413786
> rowData values names(1): paramRangeID
> colnames(2): sample_one sample_two
> colData names(1): Samples
> > vcf_no_info[1:10]
> Error in slot(x, "info")[i, , drop = FALSE] :
>   selecting rows: subscript contains NAs or out of bounds indices
>
> > sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets methods   base
>
> other attached packages:
> [1] VariantAnnotation_1.4.3 Rsamtools_1.10.2 Biostrings_2.26.2
> [4] GenomicRanges_1.10.5    IRanges_1.16.4 BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
>  [1] AnnotationDbi_1.20.2   Biobase_2.18.0 biomaRt_2.14.0
>  [4] bitops_1.0-5           BSgenome_1.26.1 compiler_2.15.2
>  [7] DBI_0.2-5              GenomicFeatures_1.10.0 parallel_2.15.2
> [10] RCurl_1.95-3           RSQLite_0.11.2 rtracklayer_1.18.0
> [13] stats4_2.15.2          tools_2.15.2 XML_3.95-0.1
> [16] zlibbioc_1.4.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list