[BioC] VariantAnnotation and distance to genes

Steve Lianoglou lianoglou.steve at gene.com
Mon May 13 18:23:14 CEST 2013


Hi,

On Mon, May 13, 2013 at 8:48 AM, James W. MacDonald <jmacdon at uw.edu> wrote:
> Hi Gad,
>
>
> On 5/12/2013 11:41 PM, Gad Abraham wrote:
>>
>> Hi,
>>
>> I'm using VariantAnnotation, which is a very nice package, to annotate
>> some
>> SNPs, based on
>>
>> http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps-and-indels-in-bioconductor
>> .
>>
>> For SNPs annotated as intergenic and therefore falling between two genes
>> (PRECEDEID and FOLLOWID), is there a way of getting the actual distance
>> between the SNPs and these two genes?
>>
>> I'm using:
>>
>> head(m, 3)
>>           RS  Chr      Pos
>> 1 rs2187668 chr6 32713862
>> 2 rs9357152 chr6 32772938
>> 3 rs3099844 chr6 31556955
>>
>> target<- with(m,
>>        GRanges(seqnames=Rle(Chr),
>>           ranges=IRanges(Pos, end=Pos, names=RS),
>>           strand=Rle(strand("*"))
>>        )
>> )
>>
>> loc<- locateVariants(target, TxDb.Hsapiens.UCSC.hg18.knownGene,
>> AllVariants())
>> names(loc)<- NULL
>> out<- as.data.frame(loc)
>>
>> head(out, 3)
>>        names seqnames    start      end   LOCATION GENEID PRECEDEID
>> FOLLOWID
>> GeneSymbol PrecedeSymbol FollowSymbol
>> 1 rs2187668     chr6 32713862 32713862     intron   3117<NA>
>> <NA>    HLA-DQA1<NA>          <NA>
>> 4 rs9357152     chr6 32772938 32772938 intergenic<NA>       3119
>> 3117<NA>       HLA-DQB1     HLA-DQA1
>> 5 rs3099844     chr6 31556955 31556955 intergenic<NA>       4277
>> 352961<NA>           MICB        HCG26
>>
>> I'd like to get the distance from rs9357152 to HLA-DQB1 and to HLA-DQA1.
>
>
> There might be a far superior way to do this, but one way is to assume that
> cds == genes and go from there:
>
>> gns <- cds(TxDb.Hsapiens.UCSC.hg19.knownGene,
>> columns=c("cds_id","gene_id"))
>> ind <- sapply(values(gns)$gene_id, function(x) any(x %in%
>> c("3117","3119")))
>> sum(ind)
> [1] 28
>> gns[ind,]
> GRanges with 28 ranges and 2 metadata columns:
>             seqnames               ranges strand   |    cds_id
> gene_id
> <Rle> <IRanges> <Rle>   | <integer> <CharacterList>
>    [1]          chr6 [32605236, 32605317]      +   |     73476
> 3117
>    [2]          chr6 [32609087, 32609335]      +   |     73477
> 3117
>    [3]          chr6 [32609749, 32610030]      +   |     73478
> 3117
>    [4]          chr6 [32609749, 32610164]      +   |     73479
> 3117
>    [5]          chr6 [32610387, 32610541]      +   |     73480
> 3117
>    ...           ...                  ...    ... ...       ...
> ...
>   [24] chr6_qbl_hap6   [3860872, 3860982]      -   |    233860
> 3119
>   [25] chr6_qbl_hap6   [3861407, 3861780]      -   |    233861
> 3119
>   [26] chr6_qbl_hap6   [3861499, 3861780]      -   |    233862
> 3119
>   [27] chr6_qbl_hap6   [3864670, 3864939]      -   |    233863
> 3119
>   [28] chr6_qbl_hap6   [3866378, 3866486]      -   |    233864
> 3119
>
> Note that these two genes also map to the extra Chr6 haplotypes.
>
>> z <- cbind(as.data.frame(gns[ind,]), distance(GRanges("chr6",
>> IRanges(start=32772938, end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]
> Warning message:
> In .local(x, y, ...) :
>   The behavior of distance() has changed in Bioconductor 2.12. See ?distance
> for details.
>> colnames(z) <- c(colnames(z)[1:5], "Distance")

Saving one line of code makes it twice as nice for half the price:

R> z <- cbind(as.data.frame(gns[ind,]),
Distance=distance(GRanges("chr6", IRanges(start=32772938,
end=32772938)), gns[ind,]))[,c(1,2,4,6,7,8)]

R> head(z)
   seqnames    start width cds_id gene_id Distance
1|     chr6 32605236    82  73476    3117   167620
2|     chr6 32609087   249  73477    3117   163602
3|     chr6 32609749   282  73478    3117   162907
4|     chr6 32609749   416  73479    3117   162773
5|     chr6 32610387   155  73480    3117   162396
6|     chr6 32628013    14  78928    3119   144911

This time, it's free ... this time.

-steve

--
Steve Lianoglou
Computational Biologist
Department of Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list