[BioC] VariantAnnotation: fine define Locating variants in and around genes

Valerie Obenchain vobencha at fhcrc.org
Thu Jan 31 23:24:07 CET 2013


On 01/31/2013 01:48 PM, Fabrice Tourre wrote:
> Valerie,
>
> Thank you for your reply.
>
> Is there a function in VariantAnnotation to know whether a snp is
> within transcription region but outside coding region? Or is it in
> first exon/intron?

Yes, the function is called locateVariants(). Use AllVariants() as the 
'region' argument and subset your result on the utr and intron regions.

 From the example below,

myregions <- c("intron", "threeUTR", "fiveUTR")
loc_coding[loc_coding$LOCATION %in% myregions]


Valerie

>
> On Thu, Jan 31, 2013 at 4:30 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>> Hi Fabrice,
>>
>> To identify snps (or any ranges) in introns only, use IntronVariants() as
>> the 'region' argument. The CodingVariants are the exon regions. If you want
>> all regions except coding, I would suggest using AllVariants().
>>
>> This output is from the man page example. The 'loc_coding' name is
>> misleading since AllVariants were use as 'region'. I have changed it to
>> 'loc_all' in the devel branch.
>>
>>> loc_coding <- locateVariants(vcf_adj, txdb, AllVariants())
>>> loc_coding
>> GRanges with 16 ranges and 7 metadata columns:
>>               seqnames               ranges strand |   LOCATION   QUERYID
>>                  <Rle>            <IRanges>  <Rle> |   <factor> <integer>
>>                   chr1 [   13220,    13220]      * |     intron         1
>>                   chr1 [   13220,    13220]      * | spliceSite         1
>>                   chr1 [   13220,    13220]      * |     intron         1
>>                   chr1 [   13220,    13220]      * |     intron         1
>>                   chr1 [   13220,    13220]      * | spliceSite         1
>> ...
>> ...
>>
>> This example has variants in splice sites, introns, coding and intergenic
>> regions.
>>
>>> tbl <- table(loc_coding$LOCATION)
>>> tbl[tbl > 0]
>>
>> spliceSite     intron     coding intergenic
>>           2          7          2          5
>>
>> The result can be subset on LOCATION for the region of interest. The QUERYID
>> column maps back to the row number in the original 'query' argument to
>> locateVariants().
>>
>> introns <- loc_coding[loc_coding$LOCATION == "intron", ]
>>> head(introns, 3)
>> GRanges with 3 ranges and 7 metadata columns:
>>     seqnames         ranges strand | LOCATION   QUERYID      TXID
>>        <Rle>      <IRanges>  <Rle> | <factor> <integer> <integer>
>>         chr1 [13220, 13220]      * |   intron         1         1
>>         chr1 [13220, 13220]      * |   intron         1         2
>>         chr1 [13220, 13220]      * |   intron         1         3
>>
>>
>> Valerie
>>
>>
>>
>> On 01/31/2013 12:34 PM, Fabrice Tourre wrote:
>>>
>>> Dear list,
>>>
>>> I am using VariantAnnotation to Locate variants in and around genes.
>>>
>>> In VariantAnnotation, the region is defined as: Coding Variants,
>>> IntronVariants, FiveUTRVariants, ThreeUTRVariants, IntergenicVariants,
>>> SpliceSiteVariants or PromoterVariants.
>>>
>>> If it possible to know whether a snp is in exon/intron within
>>> transcription region but outside coding region?
>>>
>>> Thanks.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>



More information about the Bioconductor mailing list