[BioC] Complete variant toolbox: gmapR/VariantTools/VariantAnnotation - revived

Valerie Obenchain vobencha at fhcrc.org
Sun Jul 13 07:26:01 CEST 2014


Hi,

Following up on this thread:

https://stat.ethz.ch/pipermail/bioconductor/2013-December/056745.html

These changes are available in VariantAnnotation 1.11.15:

1) LOCSTART, LOCEND

locateVariants() has 2 new output columns, LOCSTART and LOCEND.
These are LOCATION-centric coordinates and can be different for each row 
so I thought these names were more descriptive than REFLOCS (discussed 
in thread). We have 2 values (start/end) instead of a single column of 
IRanges() because we can't make an IRanges() with missing values. 
Technically 'missing' ranges are represented by zero-width ranges but we 
still need a position; there is no position because there was no overlap.

2) mapCoords(), pmapCoords()

These functions are courtesy of Michael. mapCoords() maps ranges onto 
another set of coordinates. You can map to cds-centric, exon-centric or 
any other type of coordinate. See ?mapCoords in both GenomicRanges and 
GenomicAlignments.


In the previous thread we discussed added cDNA locations to 
predictCoding(). I've decided against this because it adds the 
additional overhead of the exonsBy() extraction and a findOverlaps() 
call. Not all users want the cDNA locations and those that do can now 
easily get them with mapCoords().

   ## The usual predictCoding setup:
   library(BSgenome.Hsapiens.UCSC.hg19)
   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
   txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
   fl <- system.file("extdata", "chr22.vcf.gz",
                     package="VariantAnnotation")
   vcf <- readVcf(fl, "hg19")
   vcf <- renameSeqlevels(vcf, "chr22")
   coding <- predictCoding(vcf, txdb, Hsapiens)

   ## Exon-centric or cDNA locations:
   exonsbytx <- exonsBy(txdb, "tx")
   cDNA <- mapCoords(coding[!duplicated(ranges(coding))], exonsbytx)
   coding$cDNA <- ranges(cDNA)[togroup(coding$QUERYID)]

Let me know if you run into problems or if the docs need more detail.

Valerie



More information about the Bioconductor mailing list