[BioC] mutant allele read counts
Valerie Obenchain
vobencha at fhcrc.org
Sun Jun 15 05:16:15 CEST 2014
Thanks for providing the test files. The subset bug you hit was fixed in
devel a few weeks ago. I've ported it to release as 1.10.2 which should
be available via biocLite() Monday noon-ish or from svn immediately.
With 1.10.2 I can read the snv file fine:
vr <-
readVcfAsVRanges("3_Middle_lobe_tumor--2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",
"")
>> vr[1:3,]
> VRanges with 3 ranges and 0 metadata columns:
> seqnames ranges strand ref alt
> <Rle> <IRanges> <Rle> <character> <characterOrRle>
> [1] 1 [109641, 109641] + A G
> [2] 1 [526561, 526561] + T G
> [3] 1 [691958, 691958] + G A
> totalDepth refDepth altDepth sampleNames
> <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle>
> [1] <NA> 31 5 3_Middle_lobe_tumor
> [2] <NA> 56 5 3_Middle_lobe_tumor
> [3] <NA> 41 7 3_Middle_lobe_tumor
> softFilterMatrix
> <matrix>
> [1]
> [2]
> [3]
Trying to read the indel file gives this error:
> vr <-
readVcfAsVRanges("3_Middle_lobe_tumor--2_mucosal_normal.indel.sominddet.v2.3-9.vcf",
"")
Error in validObject(.Object) :
invalid class “VRanges” object: 'refDepth' must be non-negative
Look at the depth variable only with readGeno().
> fl <- "3_Middle_lobe_tumor-2_mucosal_normal.indel.sominddet.v2.3-9.vcf"
> ad <- readGeno(fl, "AD")
> dim(ad)
[1] 547698 2 2
> min(ad)
[1] -23
The negative depth may need some investigating.
Valerie
>> sessionInfo()
> R version 3.1.0 Patched (2014-04-18 r65405)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.10.2 Rsamtools_1.16.0 Biostrings_2.32.0
> [4] XVector_0.4.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2
> [7] IRanges_1.22.9 BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] AnnotationDbi_1.26.0 BatchJobs_1.2 BBmisc_1.6
> [4] Biobase_2.24.0 BiocParallel_0.6.1 biomaRt_2.20.0
> [7] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0
> [10] codetools_0.2-8 DBI_0.2-7 digest_0.6.4
> [13] fail_1.2 foreach_1.4.2 GenomicAlignments_1.0.1
> [16] GenomicFeatures_1.16.2 iterators_1.0.7 plyr_1.8.1
> [19] Rcpp_0.11.2 RCurl_1.95-4.1 RSQLite_0.11.4
> [22] rtracklayer_1.24.2 sendmailR_1.1-2 stats4_3.1.0
> [25] stringr_0.6.2 tools_3.1.0 XML_3.98-1.1
> [28] zlibbioc_1.10.0
On 06/13/2014 04:33 PM, Valerie Obenchain wrote:
> Hi,
>
> I see you've got an old version of the package. The release version is
> 1.10.1. FYI you can see all release versions here:
>
> http://www.bioconductor.org/checkResults/release/bioc-LATEST/
>
> Please update R and reinstall packages with biocLite(). If you need any
> help with that guidelines are here:
> http://www.bioconductor.org/install/
>
> If you want to send me a small test file offline I can confirm that the
> current version can read it.
>
>
> Valerie
>
>
> On 06/13/2014 03:06 PM, Murli wrote:
>> Sorry, forgot to post the code contained in the R file in the earlier
>> post. These are the few lines in vcfToCravat.R that I am sourcing.
>> library(VariantAnnotation)
>> DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/analysis/"
>> vcfFile=paste(DataDir,"3_Middle_lobe_tumor--2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep
>>
>> ="")
>> vcfFile=paste(ataDir,"3_Middle_lobe_tumor--2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="")
>>
>> vr <- readVcfAsVRanges(vcfFile, "hg19")
>> df <- as.data.frame(vr)
>>
>>
>> On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair at gmail.com
>> <mailto:murlinair at gmail.com>> wrote:
>>
>> Hi Valerie,
>> Thanks for the help. I am getting the following errors when I am
>> reading the vcf files.
>> vr <- readVcfAsVRanges(vcfFile, "hg19")
>> Error in lapply(ivar[inms], drop) :
>> error in evaluating the argument 'X' in selecting a method for
>> function 'lapply': Error in normalizeSingleBracketSubscript(j, x) :
>> subscript contains invalid names
>>
>> With another file I am getting the following
>> > source("vcfToCravat.R")
>> Error in validObject(.Object) :
>> invalid class VRanges
>>
>> Cheers../Murli
>>
>>
>> On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain
>> <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>> wrote:
>>
>> Hi,
>>
>> Use readVcfAsVRanges() then coerce to a data.frame.
>>
>> fl <- system.file("extdata", "chr7-sub.vcf.gz",
>> package="VariantAnnotation")
>> vr <- readVcfAsVRanges(fl, "hg19")
>> df <- as.data.frame(vr)
>>
>> You'll have some extra columns in the data.frame but you can
>> remove / rename columns as necessary.
>>
>> Valerie
>>
>>
>>
>>
>>
>> On 06/13/2014 10:46 AM, Murli [guest] wrote:
>>
>> Hi,
>> I am interested in extracting information for functional
>> annotation using CRAVAT. It requires the data to be in the
>> following format.
>> ==============================__=============
>> # UID / Chr. / Position / Strand / Ref. base / Alt. base /
>> Sample ID (optional)
>> TR1 chr17 7577506 - G T TCGA-02-0231
>> TR2 chr10 123279680 - G A
>> TCGA-02-3512
>> TR3 chr13 49033967 + C A
>> TCGA-02-3532
>> TR4 chr7 116417505 + G T
>> TCGA-02-1523
>> TR5 chr7 140453136 - T A
>> TCGA-02-0023
>> TR6 chr17 37880998 + G T
>> TCGA-02-0252
>> Ins1 chr17 37880998 + G GT
>> TCGA-02-0252
>> Del1 chr17 37880998 + GA G
>> TCGA-02-0252
>> CSub1 chr2 39871235 + ATGCT GA
>> TCGA-02-0252
>>
>> ==============================__=================
>>
>> http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=#
>> <http://www.cravat.us/help.jsp?chapter=how_to_cite&article=#>
>>
>> I am trying to extract this information from vcf files
>> generated by mutect. I am using VariantAnnotation extract
>> this information. I have read the file using readVcf(), and
>> renamed the chromosomes according to txdb.
>>
>> rowData(newVcfData)
>> GRanges with 62991 ranges and 5 metadata columns:
>> seqnames ranges strand
>> | paramRangeID
>> <Rle> <IRanges> <Rle>
>> | <factor>
>> 1:109641_A/G chr1 [109641, 109641] *
>> | <NA>
>> 1:526561_T/G chr1 [526561, 526561] *
>> | <NA>
>> 1:691958_G/A chr1 [691958, 691958] *
>> | <NA>
>> 1:763781_A/T chr1 [763781, 763781] *
>> | <NA>
>> rs6594026 chr1 [782981, 782981] *
>> | <NA>
>> ... ... ... ...
>> ... ...
>> rs480725 chrX [154903224, 154903224] *
>> | <NA>
>> X:154925893_C/T chrX [154925893, 154925893] *
>> | <NA>
>> X:155038107_C/G chrX [155038107, 155038107] *
>> | <NA>
>> X:155204257_G/T chrX [155204257, 155204257] *
>> | <NA>
>> X:155234730_T/C chrX [155234730, 155234730] *
>> | <NA>
>> REF ALT
>> QUAL FILTER
>> <DNAStringSet> <DNAStringSetList>
>> <numeric> <character>
>> 1:109641_A/G A G
>> 8.90 PASS
>> 1:526561_T/G T G
>> 9.19 PASS
>> 1:691958_G/A G A
>> 13.74 PASS
>> 1:763781_A/T A T
>> 16.03 PASS
>> rs6594026 C T
>> 11.24 PASS
>> ... ... ...
>> ... ...
>> rs480725 A T
>> 6.39 PASS
>> X:154925893_C/T C T
>> 6.53 PASS
>> X:155038107_C/G C G
>> 6.64 PASS
>> X:155204257_G/T G T
>> 6.35 PASS
>> X:155234730_T/C T C
>> 6.51 PASS
>> ---
>> seqlengths:
>> chr1 chr10 chr11 chr12 chr13 chr14 ... chr5 chr6
>> chr7 chr8 chr9 chrX
>> NA NA NA NA NA NA ... NA NA
>> NA NA NA NA
>>
>>
>> Can the information be extracted using VariantAnnotation()?
>> I would appreciate your help with this.
>> Thanks ../Murli
>>
>>
>>
>> -- output of sessionInfo():
>>
>> sessionInfo()
>>
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils
>> datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1
>> [2] GenomicFeatures_1.14.5
>> [3] AnnotationDbi_1.24.0
>> [4] Biobase_2.22.0
>> [5] VariantAnnotation_1.8.13
>> [6] Rsamtools_1.14.3
>> [7] Biostrings_2.30.1
>> [8] GenomicRanges_1.14.4
>> [9] XVector_0.2.0
>> [10] IRanges_1.20.7
>> [11] BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.18.0 bitops_1.0-6 BSgenome_1.30.0
>> DBI_0.2-7
>> [5] RCurl_1.95-4.1 RSQLite_0.11.4
>> rtracklayer_1.22.7 stats4_3.0.2
>> [9] tools_3.0.2 XML_3.98-1.1 zlibbioc_1.8.0
>>
>>
>> --
>> Sent via the guest posting facility at bioconductor.org
>> <http://bioconductor.org>.
>>
>>
>>
>> --
>> Valerie Obenchain
>> Program in Computational Biology
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, Seattle, WA 98109
>>
>> Email: vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>> Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
>>
>>
>>
>
>
--
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109
Email: vobencha at fhcrc.org
Phone: (206) 667-3158
More information about the Bioconductor
mailing list