[BioC] VariantAnnotation: Specifying 'seqinfo' at import with 'readVcf'

Valerie Obenchain vobencha at fhcrc.org
Tue Sep 24 18:53:46 CEST 2013


On 09/24/2013 09:36 AM, Julian Gehring wrote:
> Hi Valerie,
>
> In this case, I'm not concerned about reading only a part of the VCF file.
>
> When I call 'readVCF', a 'GRanges' object gets created and also the
> corresponding 'seqinfo' slot.  I was trying to find a way to feed the
> 'seqinfo' information directly in the construction of the VCF, rather
> than changing it after the VCF object already has been created.  Is
> there a way to do this?

Thanks for clarifying. No, there is not currently a way to do this.

The 'seqinfo' on the rowData(vcf) should not be difficult to change. Can 
you provide more detail as to (1) why you are changing it (did readVcf() 
import something incorrectly or ?) and (2) what operations on the 
'seqinfo' are taking a long time.

Thanks.
Valerie

>
> Best wishes
> Julian
>
>
> On 09/24/2013 06:31 PM, Valerie Obenchain wrote:
>> Hi Julian,
>>
>> On 09/24/2013 02:29 AM, Julian Gehring wrote:
>>> Hi,
>>>
>>> Is there a direct way to specifiy the 'seqinfo' of a genome for the
>>> import of a VCF file using 'readVcf'?
>>
>> I think the question is how to read in a subset of chromosomes/positions
>> from a vcf file without an accompanying tabix index. You can't.
>> readVcf() requires an index when subsets are defined by
>> chromosome/position. However you can read in subsets defined by INFO
>> and/or GENO fields without an index.
>>
>> Approaches:
>> (1) create index with ?indexTabix and specify 'which' in ScanVcfParam
>> (2) use ?filterVcf to write out a new file of records of interest
>>
>>> I'm aware that one can change it
>>> with the 'seqinfo' method afterwards, but for large VCF files this can
>>> take a significant amount of time.
>>
>> What operation is taking along time? Subsetting the VCF object by
>> chromosome?
>>
>> Valerie
>>
>>>
>>> An alternative would be to sneak it in by the 'which' arguments, such
>>> as:
>>>
>>> readVcf(file, genome, ScanVcfParam(which = as(seq_info, "GRanges")))
>>>
>>> but this requires the file to be indexed beforehand.
>>>
>>> Best wishes
>>> Julian
>>>
>>



More information about the Bioconductor mailing list