[BioC] VariantAnnotation: Specifying 'seqinfo' at import with 'readVcf'

Julian Gehring julian.gehring at embl.de
Tue Sep 24 19:00:33 CEST 2013


Hi Valerie,

> Thanks for clarifying. No, there is not currently a way to do this.
>
> The 'seqinfo' on the rowData(vcf) should not be difficult to change. Can
> you provide more detail as to (1) why you are changing it (did readVcf()
> import something incorrectly or ?) and (2) what operations on the
> 'seqinfo' are taking a long time.

'readVcf' did everything in a correct way.  I needed to add the 
information about the genome, circularity, and seqlengths based on 
external annotitation, since it is not part of the VCF file.

I can't tell you at the moment where 'seqinfo <-' spends most of its 
time.  I'll profile it and get back to you.

Thank your for your quick answer and your help!

Best wishes
Julian


> Thanks.
> Valerie
>
>>
>> Best wishes
>> Julian
>>
>>
>> On 09/24/2013 06:31 PM, Valerie Obenchain wrote:
>>> Hi Julian,
>>>
>>> On 09/24/2013 02:29 AM, Julian Gehring wrote:
>>>> Hi,
>>>>
>>>> Is there a direct way to specifiy the 'seqinfo' of a genome for the
>>>> import of a VCF file using 'readVcf'?
>>>
>>> I think the question is how to read in a subset of chromosomes/positions
>>> from a vcf file without an accompanying tabix index. You can't.
>>> readVcf() requires an index when subsets are defined by
>>> chromosome/position. However you can read in subsets defined by INFO
>>> and/or GENO fields without an index.
>>>
>>> Approaches:
>>> (1) create index with ?indexTabix and specify 'which' in ScanVcfParam
>>> (2) use ?filterVcf to write out a new file of records of interest
>>>
>>>> I'm aware that one can change it
>>>> with the 'seqinfo' method afterwards, but for large VCF files this can
>>>> take a significant amount of time.
>>>
>>> What operation is taking along time? Subsetting the VCF object by
>>> chromosome?
>>>
>>> Valerie
>>>
>>>>
>>>> An alternative would be to sneak it in by the 'which' arguments, such
>>>> as:
>>>>
>>>> readVcf(file, genome, ScanVcfParam(which = as(seq_info, "GRanges")))
>>>>
>>>> but this requires the file to be indexed beforehand.
>>>>
>>>> Best wishes
>>>> Julian
>>>>
>>>
>



More information about the Bioconductor mailing list