[BioC] Subsetting "sites only" VCF objects

Valerie Obenchain vobencha at fhcrc.org
Tue Aug 28 18:21:04 CEST 2012


Hi Richard,

Thanks for reporting this bug. The problem was that the VCF class could 
not be subset by row when the object had no columns. This has been fixed 
for both SummarizedExperiment and the VCF class.

fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
 > dim(vcf)
[1] 5 3

## remove 'geno' and 'colData' to give the object 0 columns
geno(vcf) <- SimpleList()
colData(vcf) <- DataFrame()
 > dim(vcf)
[1] 5 0
 > vcf[1:3, ]
class: VCF
dim: 3 0
genome: hg19
exptData(1): header
fixed(4): REF ALT QUAL FILTER
info(6): NS DP ... DB H2
geno(0):
rownames(3): rs6054257 20:17330 rs6040355
rowData values names(1): paramRangeID
colnames: NULL
colData names(0):


Fixes have been made in the following versions,

Release :
GenomicRanges 1.8.13
VariantAnnotation 1.2.11

Devel :
GenomicRanges 1.9.56
VariantAnnotation 1.3.24


Valerie

On 08/24/2012 04:10 AM, Richard Pearson wrote:
> Hi
>
> It is wonderful that we can create subsets of VariantAnnotation VCF 
> objects using [ but I have found that this doesn't work for VCFs that 
> are "sites only", i.e. have no information in geno(vcf):
> > geno(vcf)
>   SimpleList of length 0
> > passVcf <- vcf[values(fixed(vcf))[["FILTER"]] == "PASS", ]
>   Error in colData(x)[j, , drop = FALSE] :
>     selecting rows: subscript out of bounds
>
> In these cases I can create subsets, e.g. using:
>   passVcf <- VCF(
>     rowData = rowData(vcf)[values(fixed(vcf))[["FILTER"]] == "PASS"],
>     colData = colData(vcf),
>     exptData = exptData(vcf),
>     fixed = values(fixed(vcf))[values(fixed(vcf))[["FILTER"]] == 
> "PASS", -(1)],
>     info = values(info(vcf))[values(fixed(vcf))[["FILTER"]] == "PASS", 
> -(1)]
>   )
>
> But it would be great if I could also do this using [. Any chance this 
> functionality could be included in a future version of VariantAnnotation?
>
> Thanks
>
> Richard
>
> > sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C 
> LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8 
> LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8 
> LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C 
> LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods base
>
> other attached packages:
> [1] VEDA_0.0.1               ggplot2_0.9.1 VariantAnnotation_1.2.10 
> Rsamtools_1.8.6 Biostrings_2.24.1        GenomicRanges_1.8.12 
> IRanges_1.14.4           BiocGenerics_0.2.0 malariagen_0.0.1
>
> loaded via a namespace (and not attached):
>  [1] AnnotationDbi_1.18.1  Biobase_2.16.0 biomaRt_2.12.0        
> bitops_1.0-4.1        BSgenome_1.24.0 colorspace_1.1-1      
> DBI_0.2-5             dichromat_1.2-4 digest_0.5.2          
> GenomicFeatures_1.8.2
> [11] grid_2.15.0           labeling_0.1 lattice_0.20-6        
> MASS_7.3-20           Matrix_1.0-6 memoise_0.1           
> munsell_0.3           plyr_1.7.1 proto_0.3-9.2         RColorBrewer_1.0-5
> [21] RCurl_1.91-1          reshape2_1.2.1 RSQLite_0.11.1        
> rtracklayer_1.16.3    scales_0.2.1 snpStats_1.6.0        
> splines_2.15.0        stats4_2.15.0 stringr_0.6.1         
> survival_2.36-14
> [31] tools_2.15.0          XML_3.9-4 zlibbioc_1.2.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list