[BioC] scanVcf: FORMAT 'GT' not found

seth redmond seth.redmond at pasteur.fr
Mon Dec 3 19:18:03 CET 2012


Urgh, yeah I'd checked the tabs between the columns a hundred times, but I hadn't checked for trailing tabs in the header. 

thanks for the nudge…

-s



On 3 Dec 2012, at 18:20, Valerie Obenchain wrote:

> Hi Seth,
> 
> What version of VariantAnnotation are you using? Please provide the output of sessionInfo().
> 
> I think there is a spacing problem in the file - are there true tabs between each field? Test using just the first line of the file so you can easily see/modify the tabs.
> 
> I can't reproduce your error with the file output below. I may be modifying the format as I cut and paste. If looking at the spacing does not solve the problem please attach a small subset of the file - maybe just through the first 5 rows.
> 
> 
> Valerie
> 
> On 12/03/2012 03:16 AM, seth redmond wrote:
>> I keep running into an error in my VCF files but can't seem to pinpoint where the problem is. The file has a number of missing genotypes but nothing that should be causing any problems, I don't think, and it passes vcf-validator without any problem.
>> Completely unremarkable code and head of the file below:
>> 
>> Has anyone encountered this before? Or has any suggestions as to what might be the issue?
>> 
>> thanks
>> 
>> -s
>> 
>>> filename<-"tmpvcf.vcf.gz"
>>> vcftab<- TabixFile(filename, index = paste(filename, "tbi", sep="."));
>>> vcfScan<- scanVcf(filename)
>> trace: scanVcf(filename)
>> trace: scanVcf(con)
>> Error: scanVcf: record 1 field 1 FORMAT 'GT' not found
>>   path: tmpvcf.vcf.gz
>> 
>> bash-3.2$ vcf-validator tmpvcf.vcf.gz
>> The header tag 'reference' not present. (Not required but highly recommended.)
>> The header tag 'contig' not present for CHROM=2R. (Not required but highly recommended.)
>> The header tag 'contig' not present for CHROM=3L. (Not required but highly recommended.)
>> 
>> ##fileformat=VCFv4.1
>> ##samtoolsVersion=0.1.18 (r982:295)
>> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
>> ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
>> ##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
>> ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
>> ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
>> ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
>> ##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
>> ##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
>> ##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
>> ##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
>> ##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
>> ##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
>> ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
>> ##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
>> ##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
>> ##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
>> ##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
>> ##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
>> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
>> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
>> ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
>> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
>> ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
>> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
>> ##source_20121102.1=./vcf-merge -s Fd03_high.vcf.gz Fd03_low.vcf.gz Fd03_zero.vcf.gz
>> ##sourceFiles_20121102.1=0:Fd03_high.vcf.gz,1:Fd03_low.vcf.gz,2:Fd03_zero.vcf.gz
>> ##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
>> ##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes">
>> ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
>> #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Fd03_high.vcf   Fd03_low.vcf    Fd03_zero.vcf
>> 2R      23990061        .       G       A       152.33  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,2,4;DP=9;FQ=18.1;MQ=35;PV4=0.17,1,1,1;SF=0,1,2;VDB=0.0474       GT:DP4:GQ:DP:PL 0/1:3,0,2,4:48:9:121,0,45       0/1:1,3,6,5:90:15:212,0,87      0/1:2,3,7,5:99:17:214,0,103
>> 2R      23990067        .       G       A       32.80   .       AC1=1;AC=2;AF1=0.5;AN=4;DP4=4,1,2,3;DP=10;FQ=64.8;MQ=35;PV4=0.52,0.022,1,1;SF=0,1,2;VDB=0.0297  GT:DP4:GQ:DP:PL 0/1:4,1,2,3:95:10:92,0,106      .:6,8,2,1:.:17:20,.,.
>>    0/1:8,8,1,4:59:21:56,0,255
>> 2R      23990070        .       T       C       109.67  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,3,4;DP=11;FQ=10.4;MQ=35;PV4=0.2,0.091,1,1;SF=0,1,2;VDB=0.0474   GT:DP4:GQ:DP:PL 0/1:3,0,3,4:40:10:104,0,37      0/1:2,3,6,6:99:17:152,0,103     0/1:2,4,7,9:95:22:163,0,92
>> 2R      23990073        .       T       C       100.33  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=3,0,3,4;DP=12;FQ=16.1;MQ=35;PV4=0.2,0.025,1,1;SF=0,1,2;VDB=0.0504   GT:DP4:GQ:DP:PL 0/1:3,0,3,4:46:10:101,0,43      0/1:2,3,6,5:99:16:134,0,103     0/1:2,4,7,9:99:22:156,0,113
>> 2R      23990083        .       T       G       99.92   .       AC1=1;AC=2;AF1=0.4995;AN=4;DP4=3,3,3,0;DP=10;FQ=3.02;MQ=38;PV4=0.46,5.9e-05,0.23,1;SF=0,1,2;VDB=0.0426  GT:GQ:DP4:DP:PL .:.:3,3,3,0:9:27,.,.    0/1:38:2,1,6,8:17:165,0,35      0/1:81:1,4,8,10:23:190,0,78
>> 2R      23990100        .       A       C       114.67  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,2,3,1;DP=10;FQ=68;MQ=39;PV4=1,0.41,0.38,0.041;SF=0,1,2;VDB=0.0386 GT:DP4:GQ:DP:PL 0/1:4,2,3,1:98:10:95,0,141      0/1:4,5,3,6:99:18:167,0,172     0/1:4,6,3,6:99:19:172,0,185
>> 2R      23990108        .       T       A       21.40   .       AC1=1;AC=1;AF1=0.5;AN=2;DP4=5,2,3,2;DP=12;FQ=24;MQ=39;PV4=1,3.8e-05,1,1;SF=0,1,2;VDB=0.0075     GT:DP4:GQ:DP:PL 0/1:5,2,3,2:54:12:51,0,146      .:8,6,0,3:.:17:16,.,.
>>    .:5,10,1,2:.:18:1,.,.
>> 2R      23990114        .       C       T       113.00  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=6,3,4,1;DP=14;FQ=81;MQ=40;PV4=1,1,0.24,1;SF=0,1,2;VDB=0.0523        GT:DP4:GQ:DP:PL 0/1:6,3,4,1:99:14:108,0,181     0/1:4,4,3,5:99:16:166,0,147     0/1:3,4,2,7:99:16:155,0,158
>> 2R      23990116        .       A       T       20.25   .       AC1=1;AC=1;AF1=0.4871;AN=2;DP4=8,3,2,1;DP=14;FQ=-14.2;MQ=40;PV4=1,6e-05,0.093,0.25;SF=0,1,2;VDB=0.0282  GT:GQ:DP4:DP:PL .:.:8,3,2,1:14:13,.,.   0/1:40:4,9,4,1:18:38,0,204      .:.:5,10,1,1:17:0,.,.
>> 2R      23990120        .       G       C       189.67  .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,2,6,3;DP=15;FQ=103;MQ=40;PV4=1,1,0.026,1;SF=0,1,2;VDB=0.0532      GT:DP4:GQ:DP:PL 0/1:4,2,6,3:99:15:188,0,130     0/1:0,3,8,7:19:18:252,0,16      0/1:2,5,4,8:99:19:219,0,134
>> 2R      23990143        .       A       C       190.67  .       AC1=2;AC=6;AF1=1;AN=6;DP4=0,0,6,4;DP=11;FQ=-57;MQ=43;SF=0,1,2;VDB=0.0436        GT:DP4:GQ:DP:PL 1/1:0,0,6,4:57:10:248,30,0      1/1:0,0,3,6:51:9:212,27,0       1/1:0,0,2,7:51:9:211,27,0
>> 2R      23990147        .       A       T       15.36   .       AC1=1;AC=1;AF1=0.5;AN=2;DP4=5,6,2,1;DP=15;FQ=27;MQ=39;PV4=1,0.25,1,1;SF=0,1,2;VDB=0.0352        GT:DP4:GQ:DP:PL 0/1:5,6,2,1:57:14:54,0,230      .:7,5,0,2:.:14:15,.,.
>>    .:7,6,0,2:.:15:24,.,.
>> 2R      23990163        .       G       A       38.03   .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=2,2,2,3;DP=14;FQ=44;MQ=43;PV4=1,4e-05,0.44,0.19;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:2,2,2,3:74:9:71,0,106       0/1:0,1,4,1:20:6:66,0,17        0/1:0,2,4,1:51:7:67,0,48
>> 2R      23990164        .       T       C       24.03   .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,5,2,3;DP=14;FQ=22;MQ=41;PV4=1,0.00033,1,0.056;SF=0,1,2;VDB=0.0532 GT:DP4:GQ:DP:PL 0/1:4,5,2,3:52:14:49,0,164      0/1:3,2,4,1:56:10:53,0,77       0/1:1,4,4,1:63:10:60,0,96
>> 2R      23990171        .       T       C       74.67   .       AC1=1;AC=3;AF1=0.5;AN=6;DP4=4,5,3,4;DP=16;FQ=71;MQ=41;PV4=1,6.1e-07,0.1,1;SF=0,1,2;VDB=0.0532   GT:DP4:GQ:DP:PL 0/1:4,5,3,4:99:16:98,0,194      0/1:4,2,6,1:99:13:100,0,131     0/1:5,3,3,4:99:15:116,0,173
>> 2R      23990190        .       C       A       27.34   .       AC1=1;AC=1;AF1=0.4997;AN=2;DP4=4,6,2,2;DP=14;FQ=4.77;MQ=43;PV4=1,2.3e-09,1,0.15;SF=0,1,2;VDB=0.0352     GT:DP4:GQ:DP:PL 0/1:4,6,2,2:28:14:30,0,225      .:8,1,0,1:.:10:0,.,.    .:12,5,2,0:.:19:0,.,.
>> 2R      23990198        .       G       T       26.67   .       AC1=0;AC=1;AF1=0;AN=2;DP4=6,7,2,0;DP=15;FQ=-28;MQ=44;PV4=0.47,0.0016,1,0.052;SF=0,1,2;VDB=0.0260        GT:GQ:DP4:DP:PL .:.:6,7,2,0:15:0,.,.    .:.:6,1,1,0:8:3,.,.
>>      0/1:55:10,2,5,1:18:52,0,200
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 



More information about the Bioconductor mailing list