[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation

Julian Gehring julian.gehring at embl.de
Sun Dec 8 18:45:51 CET 2013


Hi Thomas,

> (1) For teaching purposes and other obvious reasons it would be useful if a
> Windows version of VariantTools were available (and perhaps for gmapR too).
> Installing the package (includes gmapR) from source works fine on both Linux
> and OS X, but not on Windows.

Due to many differences between the operating systems, building a 
package like 'gmapR' (and every package that depends on it, like 
'VariantTools') is often not possible for the windows OS.  While Michael 
or Thomas Wu may know more about the details, I would doubt that these 
packages will be available for windows soon.  As an alternative, the 
amazon bioconductor instances may be useful for you in this context.


> (3) When annotation variants with utilities from VariantAnnotation, it would
> useful to provide a convenience Summary Report function at the end of the
> workflow that exports the annotations to a file. A very common need here is to
> collapse the annotations for each variant on a single line so that one doesn't
> end up with annotation results of millions of lines as it is typical for many
> variant discovery projects. This also simplifies joins among different
> annotation instances because it maintains uniqueness among variant identifiers.
> This approach is often useful when comparing (joining) the variants among
> different genotypes (e.g. which variants are identical or unique among
> different mutants). An example solution is shown on slides 34-35 of this
> presentation:
> http://faculty.ucr.edu/~tgirke/HTML_Presentations/Manuals/Workshop_Dec_12_16_2013/Rvarseq/Rvarseq.pdf

The fact that one variant may have multiple consequences makes it often 
harder to report or post-process the results, than it would be with a 
simple 1:1 mapping.  Other softwares have the concept of reporting the 
'most severe' consequence (as annovar), but the definition for this is 
not well defined and may result in missing interesting consequences.

Merging the consequences of a variant into a single line, as you have 
shown in your slides, may make it difficult to disentangle the 
relationship between the consequences.  As an example, taking the last 
line from your presentation p. 35:

ID: Chr5:6455_T/C
Location: promoter coding
Gene: AT5G01010 AT5G01015 AT5G01020

Here, it is not possible anymore to relate the location of the variant 
to the affected gene.  Out of interest, how are you dealing with this in 
your reports?

Best wishes
Julian



More information about the Bioconductor mailing list