[BioC] VariantAnnotation - dots in the INFO field give an error

Jarno Tuimala jtuimala at gmail.com
Mon Nov 12 10:39:03 CET 2012


Hello!

I have a problem reading a VCF file with the VariantAnnotation
package. The filtered VCF file (attached as text below) has been
generated with vcftools.

This is what I tried in R and the resulting error message:

> library(VariantAnnotation)
> vcf<-readVcf("vcftools.filtered.vcf", "hg19")

Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
  record 1 (and others?) INFO '.' not found

If I understood it correctely, the dots in the INFO column of the VCF
file create the problem.

Is there an alternative way to read this vcf file and annotate it with
VariantAnnotation package?

Best Regards,
Jarno


----

This is the session info:

R version 2.15.1 Patched (2012-07-25 r59963)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Finnish_Finland.1252  LC_CTYPE=Finnish_Finland.1252
LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C
LC_TIME=Finnish_Finland.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] VariantAnnotation_1.4.3 Rsamtools_1.10.1        Biostrings_2.26.2
     GenomicRanges_1.10.2    IRanges_1.16.3
BiocGenerics_0.4.0

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.20.2   Biobase_2.18.0         biomaRt_2.14.0
   bitops_1.0-4.1         BSgenome_1.26.1        DBI_0.2-5
 GenomicFeatures_1.10.0 parallel_2.15.1
 [9] RCurl_1.95-1.1         RSQLite_0.11.2         rtracklayer_1.18.0
   stats4_2.15.1          tools_2.15.1           XML_3.95-0.1
 zlibbioc_1.4.0


And this is the VCF file:

##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality
ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square
mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of
all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood
estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood
estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype
frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test
P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of
genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable
unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable
constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand
bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the
variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of
the nonRef allele frequency in group1 samples being larger (,smaller)
than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted
chi^2 P-value for testing the association between group1 and group2
samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations
yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for
RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand
bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of
Phred-scaled genotype likelihoods">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	HG00171	HG00174	NA18486	NA18489
20	6731335	.	T	C	80.5	.	.	GT:PL:GQ	1/1:0,0,0:3	1/1:0,0,0:3	1/1:0,0,0:3	1/1:113,12,0:13
20	6732603	.	A	T	25.7	.	.	GT:PL:GQ	0/0:0,6,54:8	0/0:0,0,0:3	0/0:0,0,0:3	0/1:58,0,27:35
20	6736189	.	A	G	47.8	.	.	GT:PL:GQ	0/1:0,0,0:3	0/1:0,0,0:3	0/1:0,0,0:3	1/1:79,6,0:6
20	6736562	.	C	A	20.4	.	.	GT:PL:GQ	0/0:0,0,0:4	0/0:0,0,0:4	0/1:53,0,32:40	0/0:0,9,98:11
20	6737384	.	A	G	62	.	.	GT:PL:GQ	0/1:0,0,0:3	0/1:0,0,0:3	0/1:0,0,0:3	0/1:92,0,95:92
20	6737551	.	G	A	26.3	.	.	GT:PL:GQ	1/1:30,3,0:4	0/1:0,3,40:4	0/1:0,0,0:3	1/1:34,3,0:4
20	6738766	.	T	A	34.3	.	.	GT:PL:GQ	0/1:0,0,0:3	0/0:0,3,33:4	0/1:0,0,0:3	1/1:69,6,0:4
20	6739398	.	G	A	64	.	.	GT:PL:GQ	1/1:0,0,0:3	1/1:0,0,0:3	1/1:0,0,0:3	1/1:96,9,0:10
20	6740366	.	C	T	25.8	.	.	GT:PL:GQ	0/1:0,0,0:3	0/1:0,0,0:3	0/1:0,0,0:3	1/1:57,6,0:6
20	6740850	.	G	A	34.4	.	.	GT:PL:GQ	0/1:0,0,0:3	0/0:0,6,59:6	0/1:0,0,0:3	1/1:70,6,0:3
20	6743016	.	T	C	87.2	.	.	GT:PL:GQ	0/1:0,0,0:3	0/1:0,3,31:3	0/1:0,0,0:3	1/1:124,12,0:10
20	6743306	.	A	C	39.8	.	.	GT:PL:GQ	0/1:0,0,0:3	1/1:71,6,0:6	0/1:0,0,0:3	0/1:0,0,0:3
20	6746498	.	C	T	17.4	.	.	GT:PL:GQ	0/1:0,0,0:3	0/0:0,3,38:4	0/1:31,3,0:4	0/1:24,0,54:26
20	6749158	.	C	A	18.3	.	.	GT:PL:GQ	0/0:0,3,29:8	0/0:0,3,32:8	0/1:53,0,30:40	0/0:0,21,159:25
20	6749671	.	A	C	21.3	.	.	GT:PL:GQ	0/0:0,9,65:7	0/1:33,3,0:3	0/1:28,3,0:3	0/1:0,0,0:3
20	6751034	.	A	G	999	.	.	GT:PL:GQ	0/0:0,24,189:19	0/1:33,0,141:38	1/1:255,105,0:99	1/1:255,66,0:65
20	6751316	.	A	G	155	.	.	GT:PL:GQ	0/0:0,3,22:4	0/0:0,6,43:6	1/1:116,12,0:8	0/1:84,0,25:29
20	6754246	.	G	A	16.4	.	.	GT:PL:GQ	0/0:0,0,0:3	0/0:0,3,20:6	0/0:0,0,0:3	0/1:48,0,43:45
20	6755598	.	T	G	46	.	.	GT:PL:GQ	1/1:0,0,0:3	1/1:0,0,0:3	1/1:0,0,0:3	1/1:78,9,0:10
20	6756217	.	G	A	14.2	.	.	GT:PL:GQ	0/0:0,3,38:7	0/0:0,3,38:7	0/0:0,0,0:4	0/1:47,0,26:34
20	6760431	.	C	A	36.8	.	.	GT:PL:GQ	0/1:0,0,0:3	0/1:0,0,0:3	0/1:0,0,0:3	1/1:68,6,0:6
20	6761512	.	C	T	104	.	.	GT:PL:GQ	1/1:0,0,0:3	1/1:0,0,0:3	1/1:0,0,0:3	1/1:136,12,0:13
20	6762025	.	G	A	29.3	.	.	GT:PL:GQ	0/1:0,3,37:4	1/1:32,3,0:4	0/1:0,0,0:3	1/1:35,3,0:4
20	6765841	.	A	C	35.3	.	.	GT:PL:GQ	0/0:0,3,31:4	0/1:0,0,0:3	0/1:0,0,0:3	1/1:70,6,0:4
20	6767119	.	G	C	104	.	.	GT:PL:GQ	1/1:0,0,0:3	1/1:0,0,0:3	1/1:0,0,0:3	1/1:136,12,0:13
20	6767354	.	C	T	24	.	.	GT:PL:GQ	0/1:0,0,0:3	0/1:0,0,0:3	0/1:0,0,0:3	0/1:54,0,111:55
20	6767543	.	T	C	14.2	.	.	GT:PL:GQ	0/0:0,3,31:7	0/0:0,3,32:7	0/0:0,0,0:4	0/1:47,0,22:30
20	6769102	.	T	TC	117	.	.	GT:PL:GQ	1/1:0,0,0:6	1/1:40,3,0:9	1/1:40,3,0:9	1/1:80,6,0:11
20	6769533	.	G	A	21.4	.	.	GT:PL:GQ	0/1:0,0,0:3	0/0:0,6,64:6	0/1:0,0,0:3	1/1:57,6,0:3
20	6769676	.	A	G	27.2	.	.	GT:PL:GQ	0/0:0,3,32:5	0/0:0,3,34:5	0/0:0,0,0:3	0/1:64,6,0:3
20	6769714	.	T	C	63.2	.	.	GT:PL:GQ	1/1:68,6,0:9	1/1:0,0,0:4	1/1:0,0,0:4	1/1:29,3,0:7
20	6769877	.	T	C	14.5	.	.	GT:PL:GQ	0/1:27,0,27:27	0/1:0,0,0:3	0/0:0,6,68:6	0/1:26,3,0:4
20	6769893	.	C	A	16.7	.	.	GT:PL:GQ	0/0:0,3,38:5	0/0:0,0,0:3	0/0:0,6,63:8	0/1:54,6,0:4



More information about the Bioconductor mailing list