[BioC] mutant allele read counts

Sun Jun 15 05:16:15 CEST 2014

Thanks for providing the test files. The subset bug you hit was fixed in 
devel a few weeks ago. I've ported it to release as 1.10.2 which should 
be available via biocLite() Monday noon-ish or from svn immediately.

With 1.10.2 I can read the snv file fine:

vr <- 
readVcfAsVRanges("3_Middle_lobe_tumor--2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf", 
"")

>> vr[1:3,]
> VRanges with 3 ranges and 0 metadata columns:
>       seqnames           ranges strand         ref              alt
>          <Rle>        <IRanges>  <Rle> <character> <characterOrRle>
>   [1]        1 [109641, 109641]      +           A                G
>   [2]        1 [526561, 526561]      +           T                G
>   [3]        1 [691958, 691958]      +           G                A
>           totalDepth       refDepth       altDepth         sampleNames
>       <integerOrRle> <integerOrRle> <integerOrRle>       <factorOrRle>
>   [1]           <NA>             31              5 3_Middle_lobe_tumor
>   [2]           <NA>             56              5 3_Middle_lobe_tumor
>   [3]           <NA>             41              7 3_Middle_lobe_tumor
>       softFilterMatrix
>               <matrix>
>   [1]
>   [2]
>   [3]

Trying to read the indel file gives this error:

 >  vr <- 
readVcfAsVRanges("3_Middle_lobe_tumor--2_mucosal_normal.indel.sominddet.v2.3-9.vcf", 
"")
Error in validObject(.Object) :
   invalid class “VRanges” object: 'refDepth' must be non-negative

Look at the depth variable only with readGeno().

 > fl <- "3_Middle_lobe_tumor-2_mucosal_normal.indel.sominddet.v2.3-9.vcf"
 > ad <- readGeno(fl, "AD")
 > dim(ad)
[1] 547698      2      2
 > min(ad)
[1] -23

The negative depth may need some investigating.

Valerie

>> sessionInfo()
> R version 3.1.0 Patched (2014-04-18 r65405)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] VariantAnnotation_1.10.2 Rsamtools_1.16.0         Biostrings_2.32.0
> [4] XVector_0.4.0            GenomicRanges_1.16.3     GenomeInfoDb_1.0.2
> [7] IRanges_1.22.9           BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
>  [1] AnnotationDbi_1.26.0    BatchJobs_1.2           BBmisc_1.6
>  [4] Biobase_2.24.0          BiocParallel_0.6.1      biomaRt_2.20.0
>  [7] bitops_1.0-6            brew_1.0-6              BSgenome_1.32.0
> [10] codetools_0.2-8         DBI_0.2-7               digest_0.6.4
> [13] fail_1.2                foreach_1.4.2           GenomicAlignments_1.0.1
> [16] GenomicFeatures_1.16.2  iterators_1.0.7         plyr_1.8.1
> [19] Rcpp_0.11.2             RCurl_1.95-4.1          RSQLite_0.11.4
> [22] rtracklayer_1.24.2      sendmailR_1.1-2         stats4_3.1.0
> [25] stringr_0.6.2           tools_3.1.0             XML_3.98-1.1
> [28] zlibbioc_1.10.0

On 06/13/2014 04:33 PM, Valerie Obenchain wrote:
> Hi,
>
> I see you've got an old version of the package. The release version is
> 1.10.1. FYI you can see all release versions here:
>
> http://www.bioconductor.org/checkResults/release/bioc-LATEST/
>
> Please update R and reinstall packages with biocLite(). If you need any
> help with that guidelines are here:
> http://www.bioconductor.org/install/
>
> If you want to send me a small test file offline I can confirm that the
> current version can read it.
>
>
> Valerie
>
>
> On 06/13/2014 03:06 PM, Murli wrote:
>> Sorry, forgot to post the code contained in the R file in the earlier
>> post. These are the few lines in vcfToCravat.R that I am sourcing.
>> library(VariantAnnotation)
>> DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/analysis/"
>> vcfFile=paste(DataDir,"3_Middle_lobe_tumor--2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep
>>
>> ="")
>> vcfFile=paste(ataDir,"3_Middle_lobe_tumor--2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="")
>>
>> vr <- readVcfAsVRanges(vcfFile, "hg19")
>> df <- as.data.frame(vr)
>>
>>
>> On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair at gmail.com
>> <mailto:murlinair at gmail.com>> wrote:
>>
>>     Hi Valerie,
>>     Thanks for the help. I am getting the following errors when I am
>>     reading the vcf files.
>>       vr <- readVcfAsVRanges(vcfFile, "hg19")
>>     Error in lapply(ivar[inms], drop) :
>>        error in evaluating the argument 'X' in selecting a method for
>>     function 'lapply': Error in normalizeSingleBracketSubscript(j, x) :
>>        subscript contains invalid names
>>
>>     With another file I am getting the following
>>      > source("vcfToCravat.R")
>>     Error in validObject(.Object) :
>>        invalid class VRanges
>>
>>     Cheers../Murli
>>
>>
>>     On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain
>>     <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>> wrote:
>>
>>         Hi,
>>
>>         Use readVcfAsVRanges() then coerce to a data.frame.
>>
>>         fl <- system.file("extdata", "chr7-sub.vcf.gz",
>>         package="VariantAnnotation")
>>         vr <- readVcfAsVRanges(fl, "hg19")
>>         df <- as.data.frame(vr)
>>
>>         You'll have some extra columns in the data.frame but you can
>>         remove / rename columns as necessary.
>>
>>         Valerie
>>
>>
>>
>>
>>
>>         On 06/13/2014 10:46 AM, Murli [guest] wrote:
>>
>>             Hi,
>>             I am interested in extracting information for functional
>>             annotation using CRAVAT. It requires the data to be in the
>>             following format.
>>             ==============================__=============
>>             # UID / Chr. / Position / Strand / Ref. base / Alt. base /
>>             Sample ID (optional)
>>             TR1     chr17   7577506 -       G       T       TCGA-02-0231
>>             TR2     chr10   123279680       -       G       A
>>             TCGA-02-3512
>>             TR3     chr13   49033967        +       C       A
>>             TCGA-02-3532
>>             TR4     chr7    116417505       +       G       T
>>             TCGA-02-1523
>>             TR5     chr7    140453136       -       T       A
>>             TCGA-02-0023
>>             TR6     chr17   37880998        +       G       T
>>             TCGA-02-0252
>>             Ins1 chr17      37880998        +       G       GT
>>               TCGA-02-0252
>>             Del1 chr17      37880998        +       GA      G
>>             TCGA-02-0252
>>             CSub1 chr2      39871235        +       ATGCT   GA
>>               TCGA-02-0252
>>
>>             ==============================__=================
>>
>> http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=#
>> <http://www.cravat.us/help.jsp?chapter=how_to_cite&article=#>
>>
>>             I am trying to extract this information from vcf files
>>             generated by mutect. I am using VariantAnnotation extract
>>             this information. I have read the file using readVcf(), and
>>             renamed the chromosomes according to txdb.
>>
>>             rowData(newVcfData)
>>             GRanges with 62991 ranges and 5 metadata columns:
>>                                 seqnames                 ranges strand
>>             | paramRangeID
>>                                    <Rle>              <IRanges>  <Rle>
>>             |     <factor>
>>                    1:109641_A/G     chr1       [109641, 109641]      *
>>             |         <NA>
>>                    1:526561_T/G     chr1       [526561, 526561]      *
>>             |         <NA>
>>                    1:691958_G/A     chr1       [691958, 691958]      *
>>             |         <NA>
>>                    1:763781_A/T     chr1       [763781, 763781]      *
>>             |         <NA>
>>                       rs6594026     chr1       [782981, 782981]      *
>>             |         <NA>
>>                             ...      ...                    ...    ...
>>             ...          ...
>>                        rs480725     chrX [154903224, 154903224]      *
>>             |         <NA>
>>                 X:154925893_C/T     chrX [154925893, 154925893]      *
>>             |         <NA>
>>                 X:155038107_C/G     chrX [155038107, 155038107]      *
>>             |         <NA>
>>                 X:155204257_G/T     chrX [155204257, 155204257]      *
>>             |         <NA>
>>                 X:155234730_T/C     chrX [155234730, 155234730]      *
>>             |         <NA>
>>                                            REF                ALT
>>               QUAL      FILTER
>>                                 <DNAStringSet> <DNAStringSetList>
>>             <numeric> <character>
>>                    1:109641_A/G              A                  G
>>               8.90        PASS
>>                    1:526561_T/G              T                  G
>>               9.19        PASS
>>                    1:691958_G/A              G                  A
>>             13.74        PASS
>>                    1:763781_A/T              A                  T
>>             16.03        PASS
>>                       rs6594026              C                  T
>>             11.24        PASS
>>                             ...            ...                ...
>>             ...         ...
>>                        rs480725              A                  T
>>               6.39        PASS
>>                 X:154925893_C/T              C                  T
>>               6.53        PASS
>>                 X:155038107_C/G              C                  G
>>               6.64        PASS
>>                 X:155204257_G/T              G                  T
>>               6.35        PASS
>>                 X:155234730_T/C              T                  C
>>               6.51        PASS
>>                 ---
>>                 seqlengths:
>>                   chr1 chr10 chr11 chr12 chr13 chr14 ...  chr5  chr6
>>               chr7  chr8  chr9  chrX
>>                     NA    NA    NA    NA    NA    NA ...    NA    NA
>>               NA    NA    NA    NA
>>
>>
>>             Can the information be extracted using VariantAnnotation()?
>>             I would appreciate your help with this.
>>             Thanks ../Murli
>>
>>
>>
>>                -- output of sessionInfo():
>>
>>                 sessionInfo()
>>
>>             R version 3.0.2 (2013-09-25)
>>             Platform: x86_64-redhat-linux-gnu (64-bit)
>>
>>             locale:
>>                [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>                [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>                [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>                [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>                [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>             attached base packages:
>>             [1] parallel  stats     graphics  grDevices utils
>>             datasets  methods
>>             [8] base
>>
>>             other attached packages:
>>                [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1
>>                [2] GenomicFeatures_1.14.5
>>                [3] AnnotationDbi_1.24.0
>>                [4] Biobase_2.22.0
>>                [5] VariantAnnotation_1.8.13
>>                [6] Rsamtools_1.14.3
>>                [7] Biostrings_2.30.1
>>                [8] GenomicRanges_1.14.4
>>                [9] XVector_0.2.0
>>             [10] IRanges_1.20.7
>>             [11] BiocGenerics_0.8.0
>>
>>             loaded via a namespace (and not attached):
>>                [1] biomaRt_2.18.0     bitops_1.0-6       BSgenome_1.30.0
>>                 DBI_0.2-7
>>                [5] RCurl_1.95-4.1     RSQLite_0.11.4
>>             rtracklayer_1.22.7 stats4_3.0.2
>>                [9] tools_3.0.2        XML_3.98-1.1       zlibbioc_1.8.0
>>
>>
>>             --
>>             Sent via the guest posting facility at bioconductor.org
>>             <http://bioconductor.org>.
>>
>>
>>
>>         --
>>         Valerie Obenchain
>>         Program in Computational Biology
>>         Fred Hutchinson Cancer Research Center
>>         1100 Fairview Ave. N, Seattle, WA 98109
>>
>>         Email: vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>>         Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
>>
>>
>>
>
>

-- 
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fhcrc.org
Phone: (206) 667-3158