[BioC] mutant allele read counts

Sat Jun 14 01:33:03 CEST 2014

Hi,

I see you've got an old version of the package. The release version is 
1.10.1. FYI you can see all release versions here:

http://www.bioconductor.org/checkResults/release/bioc-LATEST/

Please update R and reinstall packages with biocLite(). If you need any 
help with that guidelines are here:
http://www.bioconductor.org/install/

If you want to send me a small test file offline I can confirm that the 
current version can read it.

Valerie

On 06/13/2014 03:06 PM, Murli wrote:
> Sorry, forgot to post the code contained in the R file in the earlier
> post. These are the few lines in vcfToCravat.R that I am sourcing.
> library(VariantAnnotation)
> DataDir="../Project_NAI_01117_TNWGS/Sample_3_Middle_lobe_tumor/analysis/"
> vcfFile=paste(DataDir,"3_Middle_lobe_tumor--2_mucosal_normal.snv.mutect.v1.1.4.annotated.vcf",sep
> ="")
> vcfFile=paste(ataDir,"3_Middle_lobe_tumor--2_mucosal_normal.indel.sominddet.v2.3-9.vcf",sep="")
> vr <- readVcfAsVRanges(vcfFile, "hg19")
> df <- as.data.frame(vr)
>
>
> On Fri, Jun 13, 2014 at 5:47 PM, Murli <murlinair at gmail.com
> <mailto:murlinair at gmail.com>> wrote:
>
>     Hi Valerie,
>     Thanks for the help. I am getting the following errors when I am
>     reading the vcf files.
>       vr <- readVcfAsVRanges(vcfFile, "hg19")
>     Error in lapply(ivar[inms], drop) :
>        error in evaluating the argument 'X' in selecting a method for
>     function 'lapply': Error in normalizeSingleBracketSubscript(j, x) :
>        subscript contains invalid names
>
>     With another file I am getting the following
>      > source("vcfToCravat.R")
>     Error in validObject(.Object) :
>        invalid class VRanges
>
>     Cheers../Murli
>
>
>     On Fri, Jun 13, 2014 at 3:29 PM, Valerie Obenchain
>     <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>> wrote:
>
>         Hi,
>
>         Use readVcfAsVRanges() then coerce to a data.frame.
>
>         fl <- system.file("extdata", "chr7-sub.vcf.gz",
>         package="VariantAnnotation")
>         vr <- readVcfAsVRanges(fl, "hg19")
>         df <- as.data.frame(vr)
>
>         You'll have some extra columns in the data.frame but you can
>         remove / rename columns as necessary.
>
>         Valerie
>
>
>
>
>
>         On 06/13/2014 10:46 AM, Murli [guest] wrote:
>
>             Hi,
>             I am interested in extracting information for functional
>             annotation using CRAVAT. It requires the data to be in the
>             following format.
>             ==============================__=============
>             # UID / Chr. / Position / Strand / Ref. base / Alt. base /
>             Sample ID (optional)
>             TR1     chr17   7577506 -       G       T       TCGA-02-0231
>             TR2     chr10   123279680       -       G       A
>             TCGA-02-3512
>             TR3     chr13   49033967        +       C       A
>             TCGA-02-3532
>             TR4     chr7    116417505       +       G       T
>             TCGA-02-1523
>             TR5     chr7    140453136       -       T       A
>             TCGA-02-0023
>             TR6     chr17   37880998        +       G       T
>             TCGA-02-0252
>             Ins1 chr17      37880998        +       G       GT
>               TCGA-02-0252
>             Del1 chr17      37880998        +       GA      G
>             TCGA-02-0252
>             CSub1 chr2      39871235        +       ATGCT   GA
>               TCGA-02-0252
>
>             ==============================__=================
>             http://www.cravat.us/help.jsp?__chapter=how_to_cite&article=# <http://www.cravat.us/help.jsp?chapter=how_to_cite&article=#>
>
>             I am trying to extract this information from vcf files
>             generated by mutect. I am using VariantAnnotation extract
>             this information. I have read the file using readVcf(), and
>             renamed the chromosomes according to txdb.
>
>             rowData(newVcfData)
>             GRanges with 62991 ranges and 5 metadata columns:
>                                 seqnames                 ranges strand
>             | paramRangeID
>                                    <Rle>              <IRanges>  <Rle>
>             |     <factor>
>                    1:109641_A/G     chr1       [109641, 109641]      *
>             |         <NA>
>                    1:526561_T/G     chr1       [526561, 526561]      *
>             |         <NA>
>                    1:691958_G/A     chr1       [691958, 691958]      *
>             |         <NA>
>                    1:763781_A/T     chr1       [763781, 763781]      *
>             |         <NA>
>                       rs6594026     chr1       [782981, 782981]      *
>             |         <NA>
>                             ...      ...                    ...    ...
>             ...          ...
>                        rs480725     chrX [154903224, 154903224]      *
>             |         <NA>
>                 X:154925893_C/T     chrX [154925893, 154925893]      *
>             |         <NA>
>                 X:155038107_C/G     chrX [155038107, 155038107]      *
>             |         <NA>
>                 X:155204257_G/T     chrX [155204257, 155204257]      *
>             |         <NA>
>                 X:155234730_T/C     chrX [155234730, 155234730]      *
>             |         <NA>
>                                            REF                ALT
>               QUAL      FILTER
>                                 <DNAStringSet> <DNAStringSetList>
>             <numeric> <character>
>                    1:109641_A/G              A                  G
>               8.90        PASS
>                    1:526561_T/G              T                  G
>               9.19        PASS
>                    1:691958_G/A              G                  A
>             13.74        PASS
>                    1:763781_A/T              A                  T
>             16.03        PASS
>                       rs6594026              C                  T
>             11.24        PASS
>                             ...            ...                ...
>             ...         ...
>                        rs480725              A                  T
>               6.39        PASS
>                 X:154925893_C/T              C                  T
>               6.53        PASS
>                 X:155038107_C/G              C                  G
>               6.64        PASS
>                 X:155204257_G/T              G                  T
>               6.35        PASS
>                 X:155234730_T/C              T                  C
>               6.51        PASS
>                 ---
>                 seqlengths:
>                   chr1 chr10 chr11 chr12 chr13 chr14 ...  chr5  chr6
>               chr7  chr8  chr9  chrX
>                     NA    NA    NA    NA    NA    NA ...    NA    NA
>               NA    NA    NA    NA
>
>
>             Can the information be extracted using VariantAnnotation()?
>             I would appreciate your help with this.
>             Thanks ../Murli
>
>
>
>                -- output of sessionInfo():
>
>                 sessionInfo()
>
>             R version 3.0.2 (2013-09-25)
>             Platform: x86_64-redhat-linux-gnu (64-bit)
>
>             locale:
>                [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>                [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>                [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>                [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>                [9] LC_ADDRESS=C               LC_TELEPHONE=C
>             [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
>             attached base packages:
>             [1] parallel  stats     graphics  grDevices utils
>             datasets  methods
>             [8] base
>
>             other attached packages:
>                [1] TxDb.Hsapiens.UCSC.hg19.__knownGene_2.10.1
>                [2] GenomicFeatures_1.14.5
>                [3] AnnotationDbi_1.24.0
>                [4] Biobase_2.22.0
>                [5] VariantAnnotation_1.8.13
>                [6] Rsamtools_1.14.3
>                [7] Biostrings_2.30.1
>                [8] GenomicRanges_1.14.4
>                [9] XVector_0.2.0
>             [10] IRanges_1.20.7
>             [11] BiocGenerics_0.8.0
>
>             loaded via a namespace (and not attached):
>                [1] biomaRt_2.18.0     bitops_1.0-6       BSgenome_1.30.0
>                 DBI_0.2-7
>                [5] RCurl_1.95-4.1     RSQLite_0.11.4
>             rtracklayer_1.22.7 stats4_3.0.2
>                [9] tools_3.0.2        XML_3.98-1.1       zlibbioc_1.8.0
>
>
>             --
>             Sent via the guest posting facility at bioconductor.org
>             <http://bioconductor.org>.
>
>
>
>         --
>         Valerie Obenchain
>         Program in Computational Biology
>         Fred Hutchinson Cancer Research Center
>         1100 Fairview Ave. N, Seattle, WA 98109
>
>         Email: vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>         Phone: (206) 667-3158 <tel:%28206%29%20667-3158>
>
>
>

-- 
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fhcrc.org
Phone: (206) 667-3158