[BioC] stranded intronic variants with VariantAnnotation::locateVariants()

Robert Castelo robert.castelo at upf.edu
Thu Oct 17 13:01:51 CEST 2013


hi,

i have the following feature request for the VariantAnnotation package.

currently, the function predictCoding() annotates the strand of variants 
within exons according to a given gene annotation. would it be possible 
that the function locateVariants() in the VariantAnnotation package 
annotates the strand for intronic variants?

introns are non-coding, and therefore, not annotated with 
predictCoding(), but are stranded (GT-AG).

here goes a code snippet that illustrates what i'm talking about 
(adapted from the vignette):

=================
library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
seqlevels(vcf) <- "chr22"
rd <- rowData(vcf)
loc <- locateVariants(rd, txdb, IntronVariants())

head(loc, n=3)
GRanges with 3 ranges and 7 metadata columns:
       seqnames               ranges strand | LOCATION   QUERYID 
TXID     CDSID      GENEID
          <Rle>            <IRanges>  <Rle> | <factor> <integer> 
<integer> <integer> <character>
   [1]    chr22 [50300078, 50300078]      * |   intron         1 
75253      <NA>       79087
   [2]    chr22 [50300086, 50300086]      * |   intron         2 
75253      <NA>       79087
   [3]    chr22 [50300101, 50300101]      * |   intron         3 
75253      <NA>       79087
             PRECEDEID        FOLLOWID
       <CharacterList> <CharacterList>
   [1]
   [2]
   [3]
   ---
   seqlengths:
    chr22
       NA
=================

i.e., the strand column is set to * for the intronic variants. it's ok 
if this new feature would be added to the devel version, as happens 
normally with new features.


thanks!
robert.
ps: sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets methods
[8] base

other attached packages:
  [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
  [2] GenomicFeatures_1.14.0
  [3] AnnotationDbi_1.24.0
  [4] Biobase_2.22.0
  [5] VariantAnnotation_1.8.0
  [6] Rsamtools_1.14.1
  [7] Biostrings_2.30.0
  [8] GenomicRanges_1.14.1
  [9] XVector_0.2.0
[10] IRanges_1.20.0
[11] BiocGenerics_0.8.0

loaded via a namespace (and not attached):
  [1] biomaRt_2.18.0     bitops_1.0-6       BSgenome_1.30.0 DBI_0.2-7
  [5] RCurl_1.95-4.1     RSQLite_0.11.4     rtracklayer_1.22.0 stats4_3.0.2
  [9] tools_3.0.2        XML_3.95-0.2       zlibbioc_1.8.0



More information about the Bioconductor mailing list