[BioC] definition for intergenic SNPs in locateVariants function

Valerie Obenchain vobencha at fhcrc.org
Fri Apr 5 18:37:50 CEST 2013


Hello,

I've added 'upstream' and 'downstream' arguments to IntergenicVariants() 
in VariantAnnotation 1.7.1. You can get it from svn now or use 
biocLite("VariantAnnotaion") tomorrow after 9am PST.

The default values for both are 1e+06 but I am open to changing this if 
other values make more sense.

 > IntergenicVariants()
class: IntergenicVariants
upstream: 1000000
downstream: 1000000

Note that AllVariants() now has 'upstream' and 'downstream' for both 
PromoterVariants() and IntergenicVariants().

 > AllVariants()
class: AllVariants
promoter:
   upstream:  2000
   downstream:  200
intergenic:
   upstream:  1000000
   downstream:  1000000

Modifying values:
IntergenicVariants(500, 500)
AllVariants(intergenic=IntergenicVariants(500, 500))

Extracting values:
av <- AllVariants(intergenic=IntergenicVariants(500, 500))
intergenic(av)


Regarding the utr question, yes, I think it's odd that you would have 
the same number for 3' and 5'. You can investigate further by extracting 
the utr regions from your annotation and overlapping those with your 
variants. Using hg19.knownGene as an example,

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
utr3 <- threeUTRsByTranscript(txdb)
utr5 <- fiveUTRsByTranscript(txdb)

Use findOverlaps() with these regions and the variants in question to 
confirm results.


Valerie





On 04/05/2013 05:39 AM, Adaikalavan Ramasamy wrote:
> Thanks Valerie.
>
> One more question if you don't mind. I am annotating several thousand
> SNPs. The number of SNPs allocated to 3'UTR or 5'UTR are exactly the
> same. I will have expected more from the 3'UTR as the 3'UTR is much
> longer than 5'UTR. Is this a lucky coincidence? Sorry for the vagueness
> of the question. Thanks.
>
>
>
> On Thu, Apr 4, 2013 at 11:56 AM, Adaikalavan Ramasamy
> <adaikalavan.ramasamy at gmail.com <mailto:adaikalavan.ramasamy at gmail.com>>
> wrote:
>
>     Thanks. That will be a useful feature.
>
>     In the meantime I will try to delete manually when intergenic snps
>     are too far from the precede or follow ids. Thanks.
>
>     Thanks. That will be a useful feature.
>
>     In the meantime I will try to delete manually when intergenic snps
>     are too far from the precede or follow ids. Thanks.
>     Valerie Obenchain <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>
>     wrote:
>     On 04/03/13 17:10, Adaikalavan Ramasamy wrote:
>      > Dear Valerie,
>      >
>      > Do you mean there is no limit on the flanking gene? Even if it is say
>      > 10Mb away?
>
>     Yes, that's correct.
>
>     Since upstream/downstream distance is relevant for eQTL analysis it
>     sounds like this functionality would be useful for a wider audience.
>     I've put this on my TODO for the next devel cycle. Thanks for the
>     suggestion.
>
>     Valerie
>      >
>      > I am working on eQTLs where most people assume the cis-eQTLs
>     operate on
>      > a gene located a short distance away (< 1Mb) so we may want to treat
>      > those that are located very far away differently. Thanks.
>      >
>      > Regards, Adai
>      >
>      >
>      >
>      >
>      > On Wed, Apr 3, 2013 at 4:33 PM, Valerie Obenchain
>     <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>      > <mailto:vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>> wrote:
>      >
>      >     Hi Adai,
>      >
>      >     The intergenic SNPs are those that fall outside of the gene
>     ranges
>      >     defined in the annotation. There is a table in the vignette that
>      >     briefly describes this.
>      >
>      >     With a txdb as the annotation, "transcripts by gene" are
>     extracted
>      >     and findOverlaps() is performed with the variant ranges. Variants
>      >     that do not have a 'hit' are considered to fall outside gene
>      >     regions. For these variants we determine which genes fall to
>     either
>      >     side (PREDEDEID and FOLLOWID in the output). There is no
>     limit for
>      >     upstream/downstream searching. We simply take the next
>     closest gene
>      >     if one exists.
>      >
>      >     If you were able to define upsteam/downstream limits I'm assuming
>      >     you're interested in all genes that fell in that range, not
>     just the
>      >     next closest gene?
>      >
>      >     Valerie
>      >
>      >
>      >
>      >
>      >     On 04/03/2013 05:31 AM, Adaikalavan Ramasamy wrote:
>      >
>      >         Dear all,
>      >
>      >         I have been using the locateVariants function in the
>      >         VariantAnnotation
>      >         package. It has been great and we are now in the process of
>      >         writing the
>      >         methods section.
>      >
>      >         May I know how the intergenic SNPs were defined? What is the
>      >         limit upstream
>      >         and downstream to define PRECEDEID and FOLLOWID. I check the
>      >         manuals and
>      >         mailing list without much luck.
>      >
>      >         This is of less importance but is there a way to adjust these
>      >         definition/limit if we want to do so in future? Thank you.
>      >
>      >         Regards, Adai
>      >
>      >                  [[alternative HTML version deleted]]
>      >
>      >         _________________________________________________
>      >         Bioconductor mailing list
>      > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>      > https://stat.ethz.ch/mailman/__listinfo/bioconductor
>      >         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>      >         Search the archives:
>      > http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>      >
>     <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>      >
>      >
>      >     _________________________________________________
>      >     Bioconductor mailing list
>      > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>      > https://stat.ethz.ch/mailman/__listinfo/bioconductor
>      >     <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>      >     Search the archives:
>      >
>     http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>      >
>      >
>
>
>     On 04/03/13 17:10, Adaikalavan Ramasamy wrote:
>      > Dear Valerie,
>      >
>      > Do you mean there is no limit on the flanking gene? Even if it is say
>      > 10Mb away?
>
>     Yes, that's correct.
>
>     Since upstream/downstream distance is relevant for eQTL analysis it
>     sounds like this functionality would be useful for a wider audience.
>     I've put this on my TODO for the next devel cycle. Thanks for the
>     suggestion.
>
>     Valerie
>      >
>      > I am working on eQTLs where most people assume the cis-eQTLs
>     operate on
>      > a gene located a short distance away (< 1Mb) so we may want to treat
>      > those that are located very far away differently. Thanks.
>      >
>      > Regards, Adai
>      >
>      >
>      >
>      >
>      > On Wed, Apr 3, 2013 at 4:33 PM, Valerie Obenchain
>     <vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
>      > <mailto:vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>>> wrote:
>      >
>      >     Hi Adai,
>      >
>      >     The intergenic SNPs are those that fall outside of the gene
>     ranges
>      >     defined in the annotation. There is a table in the vignette that
>      >     briefly describes this.
>      >
>      >     With a txdb as the annotation, "transcripts by gene" are
>     extracted
>      >     and findOverlaps() is performed with the variant ranges. Variants
>      >     that do not have a 'hit' are considered to fall outside gene
>      >     regions. For these variants we determine which genes fall to
>     either
>      >     side (PREDEDEID and FOLLOWID in the output). There is no
>     limit for
>      >     upstream/downstream searching. We simply take the next
>     closest gene
>      >     if one exists.
>      >
>      >     If you were able to define upsteam/downstream limits I'm assuming
>      >     you're interested in all genes that fell in that range, not
>     just the
>      >     next closest gene?
>      >
>      >     Valerie
>      >
>      >
>      >
>      >
>      >     On 04/03/2013 05:31 AM, Adaikalavan Ramasamy wrote:
>      >
>      >         Dear all,
>      >
>      >         I have been using the locateVariants function in the
>      >         VariantAnnotation
>      >         package. It has been great and we are now in the process of
>      >         writing the
>      >         methods section.
>      >
>      >         May I know how the intergenic SNPs were defined? What is the
>      >         limit upstream
>      >         and downstream to define PRECEDEID and FOLLOWID. I check the
>      >         manuals and
>      >         mailing list without much luck.
>      >
>      >         This is of less importance but is there a way to adjust these
>      >         definition/limit if we want to do so in future? Thank you.
>      >
>      >         Regards, Adai
>      >
>      >                  [[alternative HTML version deleted]]
>      >
>      >         _________________________________________________
>      >         Bioconductor mailing list
>      > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>      > https://stat.ethz.ch/mailman/__listinfo/bioconductor
>      >         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>      >         Search the archives:
>      > http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>      >
>     <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>      >
>      >
>      >     _________________________________________________
>      >     Bioconductor mailing list
>      > Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     <mailto:Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>>
>      > https://stat.ethz.ch/mailman/__listinfo/bioconductor
>      >     <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>      >     Search the archives:
>      >
>     http://news.gmane.org/gmane.__science.biology.informatics.__conductor <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>      >
>      >
>
>
>



More information about the Bioconductor mailing list