[BioC] Help needed! What wrong with VariantAnnotation and TCGA vcfs

Martin Morgan mtmorgan at fhcrc.org
Fri Apr 12 07:11:13 CEST 2013


On 04/11/2013 07:55 PM, ying chen wrote:
> Hi guys, sorry to bother you again.
>
> I am new to VariantAnnotation package and keep having some weird errors when testing with TCGA vcfs.
>
>> start.loc <- 55086725
>> end.loc <- 55275031
>> test.gr <- GRanges("7", IRanges(start.loc, end.loc))
>> file <- system.file("vcf", "NA06985_17.vcf.gz", package = "cgdv17")
>> params <- ScanVcfParam(which=test.gr)
>> vcf <- readVcf(file, "hg19", params)
>
> ## the above run successful with the vcf coming with the VariantAnnotation package
> ## the following tests the same code with TCGA vcf
>
>> dir()
> [1] "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz"     "TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz.tbi"
>>   vcf <- readVcf("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz", "hg19", params)
> Error in `rownames<-`(`*tmp*`, value = "PRIMARY") :
>    invalid rownames length
>>   hdr <- scanVcfHeader("TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf.gz")
> Error in `rownames<-`(`*tmp*`, value = "PRIMARY") :
>    invalid rownames length
>>
>
> I looked at the TCGA vcf and the chr7-sub.vcf included with VariantAnnotation package and could not tell what want wrong, except that TCGA vcf text file has several "PRIMARY" entries.
>
> Any suggestion?

Sorry, this was a bug in _Rsamtools_; it is fixed in version 1.12.1 which will 
be available Saturday after 10am Seattle time. The problems are with the 
##SAMPLE lines and the ##vcfProcessLog line.

Martin


>
> Thanks a lot for the help!
>
> Ying
>
> The following is the header and first 2 lines of the TCGA vcf
>
> ##fileformat=VCFv4.0
> ##fileDate=20110203
> ##center=UCSC
> ##source="bambam pipeline v1.1"
> ##reference=<ID=ncbi-human-build36,source="ftp://genome.wustl.edu/pub/reference//NCBI-human-build36/all_sequences.bam">
> ##phasing=none
> ##INDIVIDUAL=TCGA-AF-3913
> ##SAMPLE=<ID=NORMAL,Individual="TCGA-AF-3913",Description="Normal Sample",File="/cluster/depot/read/exome/TCGA-AF-3913-11A-01W-1073-09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING",Platform="Illumina",Source="dbGaP",Accession=SRS131301>
> ##SAMPLE=<ID=PRIMARY,Individual="TCGA-AF-3913",Description="Primary Tumor",File="/cluster/depot/read/exome/TCGA-AF-3913-01A-02W-1073-09_IlluminaGA-DNASeq_exome.bam_HOLD_QC_PENDING",Platform="Illumina",Source="dbGaP",Accession=SRS131293>
> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 130">
> ##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation in primary">
> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total read depth for all samples">
> ##INFO=<ID=DEL,Number=1,Type=Integer,Description="Deletion X bps away">
> ##INFO=<ID=INS,Number=1,Type=Integer,Description="Insertion X bps away">
> ##INFO=<ID=VT,Number=1,Type=String,Description="Somatic variant type">
> ##INFO=<ID=ProtCh,Number=1,Type=String,Description="Protein change due to somatic variant">
> ##INFO=<ID=SS,Number=1,Type=Integer,Description="Somatic status of sample">
> ##FILTER=<ID=q10,Description="Genotype Quality < 10">
> ##FILTER=<ID=blq,Description="Position overlaps 1000 Genomes Project mapping quality blacklist">
> ##FILTER=<ID=bldp,Description="Position overlap 1000 Genomes Project depth blacklist">
> ##FILTER=<ID=ma,Description="Position in germline has 2+ support for 2+ alleles">
> ##FILTER=<ID=idl10,Description="Position is within 10 bases of an indel">
> ##FILTER=<ID=idls5,Description="Less than 5 reads supporting indel in appropriate tissue">
> ##FILTER=<ID=fa20,Description="Fraction of ALT below 20% of reads">
> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth">
> ##FORMAT=<ID=BQ,Number=1,Type=Integer,Description="Average base quality">
> ##FORMAT=<ID=FA,Number=1,Type=Float,Description="Fraction of reads supporting ALT">
> ##tcgaversion=1.0
> ##vcfProcessLog=<InputVCF=</inside/grotto/bambam/coad_read/exome/TCGA-AF-3913_W_IlluminaGA-DNASeq_exome.vcf>,InputVCFSource=<bambam>,InputVCFVer=<1.1>,InputVCFParam=<exome>>
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL PRIMARY
> 1 4770 . A G 28 bldp;blq SS=1;VT=SNP;DB;DP=7 GT:DP:BQ:FA 0/1:3:36:0.333 0/1:4:36:0.5
> 1 4793
>
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
> attached base packages:
>   [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base
> other attached packages:
>   [1] cgdv17_0.0.20                           TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.0
>   [3] GenomicFeatures_1.12.0                  GGtools_4.8.0
>   [5] GGBase_3.22.0                           snpStats_1.10.0
>   [7] Matrix_1.0-12                           lattice_0.20-15
>   [9] survival_2.37-4                         org.Hs.eg.db_2.9.0
> [11] RSQLite_0.11.2                          DBI_0.2-5
> [13] AnnotationDbi_1.22.1                    Biobase_2.20.0
> [15] VariantAnnotation_1.6.1                 Rsamtools_1.12.0
> [17] Biostrings_2.28.0                       GenomicRanges_1.12.1
> [19] IRanges_1.18.0                          BiocGenerics_0.6.0
> [21] BiocInstaller_1.10.0
> loaded via a namespace (and not attached):
>   [1] annotate_1.38.0    biomaRt_2.16.0     bit_1.1-10         bitops_1.0-5       BSgenome_1.28.0
>   [6] ff_2.2-11          genefilter_1.42.0  grid_3.0.0         RCurl_1.95-4.1     rtracklayer_1.20.0
> [11] tools_3.0.0        XML_3.96-1.1       xtable_1.7-1       zlibbioc_1.6.0
>>
>
>
>
>
>               		 	   		
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list