[BioC] scanVcf: FORMAT 'GT' not found

Valerie Obenchain vobencha at fhcrc.org
Mon Dec 3 18:20:49 CET 2012


Hi Seth,

What version of VariantAnnotation are you using? Please provide the 
output of sessionInfo().

I think there is a spacing problem in the file - are there true tabs 
between each field? Test using just the first line of the file so you 
can easily see/modify the tabs.

I can't reproduce your error with the file output below. I may be 
modifying the format as I cut and paste. If looking at the spacing does 
not solve the problem please attach a small subset of the file - maybe 
just through the first 5 rows.


Valerie

On 12/03/2012 03:16 AM, seth redmond wrote:
> I keep running into an error in my VCF files but can't seem to pinpoint where the problem is. The file has a number of missing genotypes but nothing that should be causing any problems, I don't think, and it passes vcf-validator without any problem.
> Completely unremarkable code and head of the file below:
>
> Has anyone encountered this before? Or has any suggestions as to what might be the issue?
>
> thanks
>
> -s
>
>> filename<-"tmpvcf.vcf.gz"
>> vcftab<- TabixFile(filename, index = paste(filename, "tbi", sep="."));
>> vcfScan<- scanVcf(filename)
> trace: scanVcf(filename)
> trace: scanVcf(con)
> Error: scanVcf: record 1 field 1 FORMAT 'GT' not found
>    path: tmpvcf.vcf.gz
>
> bash-3.2$ vcf-validator tmpvcf.vcf.gz
> The header tag 'reference' not present. (Not required but highly recommended.)
> The header tag 'contig' not present for CHROM=2R. (Not required but highly recommended.)
> The header tag 'contig' not present for CHROM=3L. (Not required but highly recommended.)
>
> ##fileformat=VCFv4.1
> ##samtoolsVersion=0.1.18 (r982:295)
> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
> ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
> ##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
> ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
> ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
> ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
> ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
> ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
> ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
> ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
> ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
> ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
> ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
> ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
> ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
> ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
> ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
> ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
> ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
> ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
> ##source_20121102.1=./vcf-merge -s Fd03_high.vcf.gz Fd03_low.vcf.gz Fd03_zero.vcf.gz
> ##sourceFiles_20121102.1=0:Fd03_high.vcf.gz,1:Fd03_low.vcf.gz,2:Fd03_zero.vcf.gz
> ##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
> ##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes">
> ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Fd03_high.vcf   Fd03_low.vcf    Fd03_zero.vcf
> 2R      23990061        .       G       A       152.33  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,2,4;DP=9;FQ=18.1;MQ=35;PV4=0.17,1,1,1;SF=0,1,2;VDB=0.0474       GT:DP4:GQ:DP:PL 0/1:3,0,2,4:48:9:121,0,45       0/1:1,3,6,5:90:15:212,0,87      0/1:2,3,7,5:99:17:214,0,103
> 2R      23990067        .       G       A       32.80   .       AC1=1;AC=2;AF1=0.5;AN=4;DP4=4,1,2,3;DP=10;FQ=64.8;MQ=35;PV4=0.52,0.022,1,1;SF=0,1,2;VDB=0.0297  GT:DP4:GQ:DP:PL 0/1:4,1,2,3:95:10:92,0,106      .:6,8,2,1:.:17:20,.,.
>     0/1:8,8,1,4:59:21:56,0,255
> 2R      23990070        .       T       C       109.67  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,3,4;DP=11;FQ=10.4;MQ=35;PV4=0.2,0.091,1,1;SF=0,1,2;VDB=0.0474   GT:DP4:GQ:DP:PL 0/1:3,0,3,4:40:10:104,0,37      0/1:2,3,6,6:99:17:152,0,103     0/1:2,4,7,9:95:22:163,0,92
> 2R      23990073        .       T       C       100.33  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,3,4;DP=12;FQ=16.1;MQ=35;PV4=0.2,0.025,1,1;SF=0,1,2;VDB=0.0504   GT:DP4:GQ:DP:PL 0/1:3,0,3,4:46:10:101,0,43      0/1:2,3,6,5:99:16:134,0,103     0/1:2,4,7,9:99:22:156,0,113
> 2R      23990083        .       T       G       99.92   .       AC1=1;AC=2;AF1=0.4995;AN=4;DP4=3,3,3,0;DP=10;FQ=3.02;MQ=38;PV4=0.46,5.9e-05,0.23,1;SF=0,1,2;VDB=0.0426  GT:GQ:DP4:DP:PL .:.:3,3,3,0:9:27,.,.    0/1:38:2,1,6,8:17:165,0,35      0/1:81:1,4,8,10:23:190,0,78
> 2R      23990100        .       A       C       114.67  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,2,3,1;DP=10;FQ=68;MQ=39;PV4=1,0.41,0.38,0.041;SF=0,1,2;VDB=0.0386 GT:DP4:GQ:DP:PL 0/1:4,2,3,1:98:10:95,0,141      0/1:4,5,3,6:99:18:167,0,172     0/1:4,6,3,6:99:19:172,0,185
> 2R      23990108        .       T       A       21.40   .       AC1=1;AC=1;AF1=0.5;AN=2;DP4=5,2,3,2;DP=12;FQ=24;MQ=39;PV4=1,3.8e-05,1,1;SF=0,1,2;VDB=0.0075     GT:DP4:GQ:DP:PL 0/1:5,2,3,2:54:12:51,0,146      .:8,6,0,3:.:17:16,.,.
>     .:5,10,1,2:.:18:1,.,.
> 2R      23990114        .       C       T       113.00  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=6,3,4,1;DP=14;FQ=81;MQ=40;PV4=1,1,0.24,1;SF=0,1,2;VDB=0.0523        GT:DP4:GQ:DP:PL 0/1:6,3,4,1:99:14:108,0,181     0/1:4,4,3,5:99:16:166,0,147     0/1:3,4,2,7:99:16:155,0,158
> 2R      23990116        .       A       T       20.25   .       AC1=1;AC=1;AF1=0.4871;AN=2;DP4=8,3,2,1;DP=14;FQ=-14.2;MQ=40;PV4=1,6e-05,0.093,0.25;SF=0,1,2;VDB=0.0282  GT:GQ:DP4:DP:PL .:.:8,3,2,1:14:13,.,.   0/1:40:4,9,4,1:18:38,0,204      .:.:5,10,1,1:17:0,.,.
> 2R      23990120        .       G       C       189.67  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,2,6,3;DP=15;FQ=103;MQ=40;PV4=1,1,0.026,1;SF=0,1,2;VDB=0.0532      GT:DP4:GQ:DP:PL 0/1:4,2,6,3:99:15:188,0,130     0/1:0,3,8,7:19:18:252,0,16      0/1:2,5,4,8:99:19:219,0,134
> 2R      23990143        .       A       C       190.67  .       AC1=2;AC=6;AF1=1;AN=6;DP4=0,0,6,4;DP=11;FQ=-57;MQ=43;SF=0,1,2;VDB=0.0436        GT:DP4:GQ:DP:PL 1/1:0,0,6,4:57:10:248,30,0      1/1:0,0,3,6:51:9:212,27,0       1/1:0,0,2,7:51:9:211,27,0
> 2R      23990147        .       A       T       15.36   .       AC1=1;AC=1;AF1=0.5;AN=2;DP4=5,6,2,1;DP=15;FQ=27;MQ=39;PV4=1,0.25,1,1;SF=0,1,2;VDB=0.0352        GT:DP4:GQ:DP:PL 0/1:5,6,2,1:57:14:54,0,230      .:7,5,0,2:.:14:15,.,.
>     .:7,6,0,2:.:15:24,.,.
> 2R      23990163        .       G       A       38.03   .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=2,2,2,3;DP=14;FQ=44;MQ=43;PV4=1,4e-05,0.44,0.19;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:2,2,2,3:74:9:71,0,106       0/1:0,1,4,1:20:6:66,0,17        0/1:0,2,4,1:51:7:67,0,48
> 2R      23990164        .       T       C       24.03   .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,5,2,3;DP=14;FQ=22;MQ=41;PV4=1,0.00033,1,0.056;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:4,5,2,3:52:14:49,0,164      0/1:3,2,4,1:56:10:53,0,77       0/1:1,4,4,1:63:10:60,0,96
> 2R      23990171        .       T       C       74.67   .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,5,3,4;DP=16;FQ=71;MQ=41;PV4=1,6.1e-07,0.1,1;SF=0,1,2;VDB=0.0532   GT:DP4:GQ:DP:PL 0/1:4,5,3,4:99:16:98,0,194      0/1:4,2,6,1:99:13:100,0,131     0/1:5,3,3,4:99:15:116,0,173
> 2R      23990190        .       C       A       27.34   .       AC1=1;AC=1;AF1=0.4997;AN=2;DP4=4,6,2,2;DP=14;FQ=4.77;MQ=43;PV4=1,2.3e-09,1,0.15;SF=0,1,2;VDB=0.0352     GT:DP4:GQ:DP:PL 0/1:4,6,2,2:28:14:30,0,225      .:8,1,0,1:.:10:0,.,.    .:12,5,2,0:.:19:0,.,.
> 2R      23990198        .       G       T       26.67   .       AC1=0;AC=1;AF1=0;AN=2;DP4=6,7,2,0;DP=15;FQ=-28;MQ=44;PV4=0.47,0.0016,1,0.052;SF=0,1,2;VDB=0.0260        GT:GQ:DP4:DP:PL .:.:6,7,2,0:15:0,.,.    .:.:6,1,1,0:8:3,.,.
>       0/1:55:10,2,5,1:18:52,0,200
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list