[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation

Dan Tenenbaum dtenenba at fhcrc.org
Sun Dec 8 21:52:50 CET 2013



----- Original Message -----
> From: "Michael Lawrence" <lawrence.michael at gene.com>
> To: "Thomas Girke" <thomas.girke at ucr.edu>
> Cc: "Bioconductor mailing list" <bioconductor at stat.math.ethz.ch>
> Sent: Sunday, December 8, 2013 11:35:31 AM
> Subject: Re: [BioC] Complete variant toolbox:	gmapR/VariantTools/VariantAnnotation
> 
> On Sun, Dec 8, 2013 at 9:08 AM, Thomas Girke <thomas.girke at ucr.edu>
> wrote:
> 
> > Dear Michael and Valerie,
> >
> > VariantTools and VariantAnnotation are awesome packages. To the
> > best of my
> > knowledge, VariantTools is currently the only Bioc/R package that
> > performs
> > variant calling and it does this in a very nice way. With the
> > available
> > resources it is now straightforward to set up complete workflows
> > for
> > variant
> > calling projects: (1) variant aware read alignments with GSNAP from
> > gmapR
> > ->
> > (2) variant calling/filtering with VariantTools -> (3) adding
> > genomic
> > context
> > with VariantAnnotation. This is really amazing!!!
> >
> > Here are a few questions related to both packages:
> >
> > (1) For teaching purposes and other obvious reasons it would be
> > useful if a
> > Windows version of VariantTools were available (and perhaps for
> > gmapR too).
> > Installing the package (includes gmapR) from source works fine on
> > both
> > Linux
> > and OS X, but not on Windows.
> >
> >
> Julian has already helped answer some of these questions (thanks!).
> For
> Windows support, I would need to talk to Tom about how far he could
> port
> the GMAP suite. VariantTools currently relies on bam_tally, but I've
> also
> written a simple function that generates a basic VRanges via
> Rsamtools::applyPileups. It will become part of VariantAnnotation.
> Many
> filters in VariantTools just rely on the basic read depth
> information, so I
> could make gmapR a Suggested dependency of VariantTools, and thus
> allow
> VariantTools to work on Windows.


It would need to be an Enhances: dependency (with gmapR-specific functionality wrapped in
if(require(gmapR)).

Dan


> Tallying is a computationally
> intensive
> operation, so I'm guessing Windows users would be using the
> downstream
> functionality.
> 
> Also interesting would be integration of the HDF5 representation,
> i.e.,
> input/output to/from VRanges and generation via applyPileups. Does
> that
> already exist? And there's also the idea of storing the tallies as a
> tab-separated file, with a Tabix index. The advantage is that it
> would rely
> only on Rsamtools.
> 
> (2) The VRanges class is another great resource for filtering variant
> calls.
> > What I was not able to locate though is a description/definition of
> > the
> > content
> > of its different columns/components. Is something like this
> > available
> > somewhere?
> >
> > (3) When annotation variants with utilities from VariantAnnotation,
> > it
> > would
> > useful to provide a convenience Summary Report function at the end
> > of the
> > workflow that exports the annotations to a file. A very common need
> > here
> > is to
> > collapse the annotations for each variant on a single line so that
> > one
> > doesn't
> > end up with annotation results of millions of lines as it is
> > typical for
> > many
> > variant discovery projects. This also simplifies joins among
> > different
> > annotation instances because it maintains uniqueness among variant
> > identifiers.
> > This approach is often useful when comparing (joining) the variants
> > among
> > different genotypes (e.g. which variants are identical or unique
> > among
> > different mutants). An example solution is shown on slides 34-35 of
> > this
> > presentation:
> >
> > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rvarseq/Rvarseq.pdf
> >
> > (4) predictCoding() reports the relative location where exactly a
> > variant
> > maps
> > to an annotation range. It would be nice if locateVariants() could
> > report
> > the
> > exact relative mapping locations too, e.g. variant chr1:1033_A/T
> > maps to
> > position x of 5'UTR. Perhaps this is already possible but I
> > couldn't figure
> > out how to do it without reaching too far into my own hacking
> > toolbox.
> >
> > Thanks for providing these excellent resources and most importantly
> > your
> > patience
> > listing to these unsolicited questions.
> >
> > Best,
> >
> >
> > Thomas
> >
> >
> >
> > > sessionInfo()
> > R version 3.0.2 (2013-09-25)
> > Platform: x86_64-apple-darwin10.8.0 (64-bit)
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] parallel  stats     graphics  grDevices utils     datasets
> >  methods
> > [8] base
> >
> > other attached packages:
> > [1] VariantTools_1.4.5      VariantAnnotation_1.8.7
> > Rsamtools_1.14.2
> > [4] Biostrings_2.30.1       GenomicRanges_1.14.3    XVector_0.2.0
> > [7] IRanges_1.20.6          BiocGenerics_0.8.0
> >
> > loaded via a namespace (and not attached):
> >  [1] AnnotationDbi_1.24.0   BatchJobs_1.1-1135     BBmisc_1.4
> >  [4] Biobase_2.22.0         BiocParallel_0.4.1     biomaRt_2.18.0
> >  [7] bitops_1.0-6           brew_1.0-6             BSgenome_1.30.0
> > [10] codetools_0.2-8        DBI_0.2-7              digest_0.6.3
> > [13] fail_1.2               foreach_1.4.1
> >          GenomicFeatures_1.14.2
> > [16] gmapR_1.4.2            grid_3.0.2             iterators_1.0.6
> > [19] lattice_0.20-24        Matrix_1.1-0           plyr_1.8
> > [22] RCurl_1.95-4.1         RSQLite_0.11.4
> >         rtracklayer_1.22.0
> > [25] sendmailR_1.1-2        stats4_3.0.2           tools_3.0.2
> > [28] XML_3.95-0.2           zlibbioc_1.8.0
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list