[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation

Thomas Girke thomas.girke at ucr.edu
Sun Dec 8 19:03:50 CET 2013


Hi Julian,

I certainly understand the difficulty of supporting all OSs for certain
packages. If it were possible in this case then it would certainly not
be a waste of time. 

The latter snp example would remain ambiguous in its gene assignment 
which is fine. Usually, we would just flag it that way. 

Thomas


On Sun, Dec 08, 2013 at 05:45:51PM +0000, Julian Gehring wrote:
> 
> 
> Hi Thomas,
> 
> > (1) For teaching purposes and other obvious reasons it would be useful if a
> > Windows version of VariantTools were available (and perhaps for gmapR too).
> > Installing the package (includes gmapR) from source works fine on both Linux
> > and OS X, but not on Windows.
> 
> Due to many differences between the operating systems, building a 
> package like 'gmapR' (and every package that depends on it, like 
> 'VariantTools') is often not possible for the windows OS.  While Michael 
> or Thomas Wu may know more about the details, I would doubt that these 
> packages will be available for windows soon.  As an alternative, the 
> amazon bioconductor instances may be useful for you in this context.
> 
> 
> > (3) When annotation variants with utilities from VariantAnnotation, it would
> > useful to provide a convenience Summary Report function at the end of the
> > workflow that exports the annotations to a file. A very common need here is to
> > collapse the annotations for each variant on a single line so that one doesn't
> > end up with annotation results of millions of lines as it is typical for many
> > variant discovery projects. This also simplifies joins among different
> > annotation instances because it maintains uniqueness among variant identifiers.
> > This approach is often useful when comparing (joining) the variants among
> > different genotypes (e.g. which variants are identical or unique among
> > different mutants). An example solution is shown on slides 34-35 of this
> > presentation:
> > http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rvarseq/Rvarseq.pdf
> 
> The fact that one variant may have multiple consequences makes it often 
> harder to report or post-process the results, than it would be with a 
> simple 1:1 mapping.  Other softwares have the concept of reporting the 
> 'most severe' consequence (as annovar), but the definition for this is 
> not well defined and may result in missing interesting consequences.
> 
> Merging the consequences of a variant into a single line, as you have 
> shown in your slides, may make it difficult to disentangle the 
> relationship between the consequences.  As an example, taking the last 
> line from your presentation p. 35:
> 
> ID: Chr5:6455_T/C
> Location: promoter coding
> Gene: AT5G01010 AT5G01015 AT5G01020
> 
> Here, it is not possible anymore to relate the location of the variant 
> to the affected gene.  Out of interest, how are you dealing with this in 
> your reports?
> 
> Best wishes
> Julian
>



More information about the Bioconductor mailing list