[BioC] VariantAnnotation: Specifying 'seqinfo' at import with 'readVcf'

Valerie Obenchain vobencha at fhcrc.org
Tue Sep 24 18:31:00 CEST 2013


Hi Julian,

On 09/24/2013 02:29 AM, Julian Gehring wrote:
> Hi,
>
> Is there a direct way to specifiy the 'seqinfo' of a genome for the
> import of a VCF file using 'readVcf'?

I think the question is how to read in a subset of chromosomes/positions 
from a vcf file without an accompanying tabix index. You can't. 
readVcf() requires an index when subsets are defined by 
chromosome/position. However you can read in subsets defined by INFO 
and/or GENO fields without an index.

Approaches:
(1) create index with ?indexTabix and specify 'which' in ScanVcfParam
(2) use ?filterVcf to write out a new file of records of interest

> I'm aware that one can change it
> with the 'seqinfo' method afterwards, but for large VCF files this can
> take a significant amount of time.

What operation is taking along time? Subsetting the VCF object by 
chromosome?

Valerie

>
> An alternative would be to sneak it in by the 'which' arguments, such as:
>
> readVcf(file, genome, ScanVcfParam(which = as(seq_info, "GRanges")))
>
> but this requires the file to be indexed beforehand.
>
> Best wishes
> Julian
>



More information about the Bioconductor mailing list