[BioC] Problem reading VCF file using readVcf from package VariantAnnotation

Valerie Obenchain vobencha at fhcrc.org
Wed Apr 24 18:06:28 CEST 2013


Hi Ulrich,

In addition to reading genomic regions with 'which' in ScanVcfParam(), 
you can also use 'yieldSize' in TabixFile() to iterate through a VCF 
file in chunks. See example at the bottom of ?readVcf man page or 
?TabixFile.

The error you are seeing is thrown from a function in GenomicRanges, 
specifically .normargSeqlengths(). This function is called when checking 
compatibility of the Seqinfo object(s).

How was the GRanges 'param' created? Did it come from a different 
genome? I'm not clear on why the GRanges has potentially different 
chromosome names, lengths and genome? If the VCF file is 'b37', all data 
(names, lengths, genome if present) in the GRanges must be compatible 
with that.

Please provide a sample of the code you've tried and the output error. 
Specifically, it would be helpful to see seqinfo(GRanges) and 
meta(header(vcf)).

Valerie

On 04/24/2013 04:53 AM, Ulrich Bodenhofer wrote:
> Hi,
>
> I am trying to read genotype data from a large VCF file using the
> readVcf() function from the VariantAnnotation package. I am not reading
> the entire file (which would crash my R session because of a lack of
> memory). Instead, I am reading bunches of SNV data located in 200kbp
> regions which I specify by passing a GRanges object to ScanVcfParams()
> first. No matter what I do, I get the following error message:
>
>     when the supplied 'genome' vector is named, the names must match the
> seqnames
>
> As far as I can make sense of this message, it seems that there is some
> mismatch between the genome characteristics in my GRanges object and the
> genome characteristics in the VCF file. I dissected the R object
> returned by scanVcfHeader() and indeed found some interesting
> mismatches: The genome in the VCF file is denoted as "b37" and the
> sequence names are not 100% compatible with hg19. The lengths of
> chromosomes 1-22, X, and Y do match, but the lengths of mitochondrial
> DNA (denoted "M" in gh19 and "MT" in b37) differ by 2. So I forced my
> GRanges object to be 100% compatible with the information stored in the
> VCF file (by copying seqlevels, genome, and seqlengths) and restricted
> my analysis to chromosomes 1-22 and X. However, I still get the same
> error message.
>
> I also tried to locate the error message in the source code of the
> VariantAnnotation package to understand better what the problem is, but
> I could not find it. It seems the message is produced by a function that
> VariantAnnotation calls from another package.
>
> Any idea?
>
> Thanks in advance and best regards,
> Ulrich
>
>
> ------------------------------------------------------------------------
> *Dr. Ulrich Bodenhofer*
> Associate Professor
> Institute of Bioinformatics
>
> *Johannes Kepler University*
> Altenberger Str. 69
> 4040 Linz, Austria
>
> Tel. +43 732 2468 4526
> Fax +43 732 2468 4539
> bodenhofer at bioinf.jku.at <mailto:bodenhofer at bioinf.jku.at>
> http://www.bioinf.jku.at/ <http://www.bioinf.jku.at>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list